Reasons to prefer Linux over Windows (2014)

November 28, 2020 • ASCII • BCD • commandlanguage • tag_03_170 • v • Why_isn

Join GitHub today

GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.

Sign up

GitHub is where the world builds software

Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.

Go to file

Code

Open with GitHub Desktop

Download ZIP

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio

If nothing happens, download the GitHub extension for Visual Studio and try again.

Objective reasons to prefer Linux to Windows

Author:
Nathaniel Beaver

Date:
September 20, 2014

Copyright:
This document is released under a Creative Commons Attribution 4.0 International License.

"Linux is more stable and reliable."

"Linux is more secure and private."

"Linux is faster and less bloated."

"Linux is more flexible and customizable."

"Linux gives you more control over your computer."

Stop it.

Clichés like these are vague and wishy-washy,
and they are founded on anecdotes and hearsay.
They cause endless, unnecessary debates and make a muddle of the facts.

It's easy to opine about one's preferred operating system,
but harder to give objective, concrete examples.

With the caveat that both Windows and Linux are moving targets,
this document describes some specific technical reasons
to prefer using Linux as a desktop operating system.

These reasons are not exhaustive—and not meant to be—but aim to be representative.

This document will not cover servers, phones, or embedded devices.

This document will not cover closed vs. open source development,
but will instead focus on functionality.
There is plenty of discussion
of the advantages and disadvantages of open source elsewhere.

(Besides, what is there to discuss
when we now know that even Microsoft loves open source?)

This discussion will only mention Microsoft and other companies
in so far as their actions are directly relevant
to the technically capabilities of Windows and Linux.

(As an aside, Microsoft gets a lot of guff in the open-source world,
but its behavior is typical for a corporation
whose a bottom line relies on sales of proprietary software and devices.
It's economics, not malice.)

The discussion is intended to be as accurate as possible,
at the cost of possible dryness due to technical detail.

I am most familiar with the Debian-based family of Linux distributions,
so my remarks will necessarily touch on these more,
but I have tried to include other distributions when possible.

In this document, the term "Linux" is shorthand for the entire distribution,
including bootloader, kernel, shell, window manager, package manager, etc.
Similarly, the term "Windows"
refers to all default components of modern versions of Microsoft Windows NT,
including Windows XP, Windows Vista, Windows 7, and Windows 8.

Many of the same arguments in favor of Linux
also apply to the BSD family of operating systems
(and POSIX-compliant operating systems in general),
but unfortunately I am not familiar enough with any of them
to comment specifically.

Most people use Windows on the desktop because it's the default.
Few are aware of the benefits of switching to another operating system,
and even fewer are willing to put in the effort to do so.

A Windows user interested in trying Linux
will probably have difficulty finding a coherent reason to do so,
since comparisons of operating systems
tend to be vague, uninformed, or opinion-based.

Even people who know and use Linux by choice
may not do a good job of explaining its benefits to their colleagues
especially without putting down Windows users
or Windows applications in general.

Also, there are many open source alternatives to Linux on the desktop,
including a binary-compatible clone of Windows called ReactOS.
If it were just a matter of being open source,
why bother with the additional effort to learn Linux?

Even if you don't use Linux or Windows,
it's useful to know where Linux has an edge,
since these issues are relevant to all operating systems.

If you are a new Linux user,
this document is intended to inform you
about some of the benefits of Linux you may not be aware of,
and to dig deeper if you are interested.

If you are an experienced Linux user,
this document is a test of the theory that the fastest way to get feedback
is to be publically wrong about something people care about.
Corrections and additions are welcome.

If you are a Windows user:

This document is not intended to convert you to Linux.
(That would be silly.)
This document does not claim that Windows is inferior in every way,
or even that it is inferior overall.
Instead, this is meant to provide insight
into why some people choose to use Linux as a desktop operating system,
despite its shortcomings,
and possibly to challenge some misconceptions
that people have about Linux and Windows.
Corrections and additions are, of course, welcome.
Windows developers are ones who know the most about its flaws and strengths.
Finally, definitions of better and worse are necessarily subjective,
despite the title's claim of objectivity.
You may heartily disagree with substantial parts of what follows,
but perhaps it may be useful to you, even so.

This is a list of examples of specific limitations
that are the result of the Windows kernel or API.

Windows LiveCDs, though they do exist,
are hampered by licensing restrictions and technical limitations.

For example, until Windows 8, desktop versions of Windows
could not boot from a USB.
(And while running a live USB of Windows 8,
it is still not possible to mount internal hard disks.)

The BartPE LiveCD building program
is 3rd party software that will run on any version of Windows,
but it is only able to make a LiveCD
for Windows XP or Windows Server 2003.

There is also the WinBuilder project,
which is the closest to a fully-functional LiveCD of modern Windows versions,
but installing software and drivers is still sometimes a challenge.

If the Virtual Machine fails don’t worry too much. Just because the Virtual
Machine fails to boot right does not mean your boot media won’t work, I’ve
seen odd results depending on the amount of memory the VM has and what
drivers I load.

http://www.irongeek.com/i.php?page=security/winbuilder-win7pe-se-tutorial

The absence of fully functional live versions of Windows
makes it difficult to use for, e.g,
determining if a bug is due to hardware or software problems,
recovering data from a machine with filesystem corruption or bad disk sectors,
and testing out different versions of an OS
without making a new hard drive partition.

Live versions of Linux are full operating systems,
able to mount and repartition disks,
connect to the internet and run a web browser,
and even retain settings and data on the next boot-up
(for persistent live USB flash drives).
This makes live versions of Linux useful for
recovering files from damaged hard drives,
making bootable backups of an entire drive,
scanning a disk for malware
without loading a potentially compromised operating system,
distinguishing hardware problems from software problems,
and other tasks requiring a temporary operating system.

Some live Linux distributions, such as Puppy Linux,
are lightweight enough that they default to running from a RAM disk,
and consequently have much faster disk I/O
than an OS that must access a spinning hard drive.
(This comes at the cost of disk space being limited by RAM.
There's no reason you can't mount an internal or external drive
to store files, though.)

Very little hardware comes with a desktop version of Linux pre-installed,
so live versions of Linux tend to work very well,
since that is almost always the way it is installed.

Similar to live booting,
Linux is often run as a virtual machine,
and consequently it is well-adapted to changes in hardware.

An existing Linux partition on a physical hard drive
can, with some care, be virtualized and run on another machine,
a virtue which Windows does not share.

Windows installations, unlike Linux, cannot easily be moved from one
hardware to another. This is not just due to Microsoft's activation
mechanism but the fact that the installed kernel and drivers depend on the
actual hardware.

https://www.virtualbox.org/wiki/Migrate_Windows

The problem lies with Windows, in that its driver settings, particularly
for storage devices, are not portable. Unless you modify the Windows
registry to force start storage drivers for both the physical and virtual
machines, you will mostly likely end up with a 0x0000007B STOP blue
screen error each time which will require a restore or modifying the
registry to fix.

https://askubuntu.com/questions/174581/is-there-any-way-to-boot-windows-7-partition-in-virtual-machine

It's even possible to transfer a Linux install to a USB enclosure
and boot it directly on another machine of the same architecture,
although the kernel will lack proprietary drivers (e.g. some wifi cards).

Windows path lengths are limited to 260 characters, including filename.
(In practice, it is often more like 199 characters.)
This is not a flaw in NTFS or Windows per se,
but in the non-Unicode version of the Windows API.

This problem can be avoided by using Unicode versions of the API calls,
but many applications
(e.g. Windows Explorer, .NET and consequently Powershell)
have not done so.

Of course, most OS restrictions are not an issue in well-written software.
Maybe Windows paths are long enough.
Is MAX_PATH an actual problem in real software?

Judging by the number of bug reports and complaints,
the answer appears to be yes.

https://github.com/joyent/node/issues/6960
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61922
http://llvm.org/bugs/show_bug.cgi?id=20440
https://bugs.eclipse.org/bugs/show_bug.cgi?id=164186
http://bugs.python.org/issue19636
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14228
http://social.msdn.microsoft.com/forums/vstudio/en-US/e4a8ee8d-b25d-4b47-8c0c-88329bbece7d/please-increase-maxpath-to-32767
http://stackoverflow.com/questions/1880321/why-does-the-260-character-path-length-limit-exist-in-windows
http://stackoverflow.com/questions/1065993/has-windows-7-fixed-the-255-character-file-path-limit
http://stackoverflow.com/questions/833291/is-there-an-equivalent-to-winapis-max-path-under-linux-unix
http://stackoverflow.com/questions/1858907/svn-command-line-utility-will-not-work-if-full-file-name-is-longer-then-256-char
https://www.itefix.net/content/rsync-file-name-too-long-91
http://sqlite.1065341.n5.nabble.com/Path-Length-Limit-on-Windows-td70642.html
http://sumedha.blogspot.com/2011/01/svn-checkout-fails-windows-max-path.html
http://forums.mozillazine.org/viewtopic.php?f=29&t=263489

But the bigger issue
is that many Windows developers are so used to working around the problem
that it has become deeply entrenched and may never be fixed.

The Linux kernel does have an adjustable pathname length limit;
it's 4096 chars in typical kernels and filesystems.
You can check it by running:

$ getconf PATH_MAX /

However, this limit is not enforced
by any filesystems that Linux runs on,
and consequently some libc implementations
were for a while susceptible to buffer overflow
when trying to resolve canonical file paths.

The 2008 POSIX revision has addressed the issue, but prior to this
the Linux kernel had to make non-standard modifications to avoid overflow,
and warned about the problem in the realpath (3) man page
of the Linux Programmer's Manual.

This illustrates that while the Linux kernel developers
scrupulously avoid breaking external compatibility,
they also intentionally expose false assumptions,
since false assumptions tend to cause hard-to-fix bugs.
This is why Linus Torvalds
chose an unusually high timer interrupt frequency for Linux:

I chose 1000 originally partly as a way to make sure that people that
assumed HZ was 100 would get a swift kick in the pants. That meant making
a _big_ change, not a small subtle one. For example, people tend to react
if "uptime" suddenly says the machine has been up for a hundred days (even
if it's really only been up for ten), but if it is off by just a factor of
two, it might be overlooked.

—Linus Torvalds, Selectable Frequency of the Timer Interrupt (2005)

Linux uses case-sensitive filenames
because Unix used case-sensitive filenames.
Unix was case-sensitive because Multics was case-sensitive.
Multics was case-sensitive because the ASCII standard
included both an uppercase and a lowercase alphabet. [1]

Why did ASCII do this?
It was a close call, and almost didn't happen.

Telegraphy codes used uppercase only,
or at least did not distinguish upper and lowercase.
Even ITA2, an international standard from 1930,
used a 5-bit code with a shift to switch between letters and figures,
but not upper and lowercase. [2]
Similarly, punched cards used uppercase letters only.

Encodings with different bit patterns for uppercase and lowercase
had been proposed as early as 1959, [3]
though they were not widely implemented.
For example, the IBM 7030 "Stretch" supercomputer,
first installed at Los Alamos National Laboratory in 1961,
had an 8-bit encoding that interleaved uppercase and lowercase alphabets.
[4]
However, the 7030's character encoding did not catch on.

Early on, ASCII committee concluded that 6-bit encodings (64 bit patterns)
were insufficient to include both control characters and special characters
in addition to the required 26 alphabetics and 10 numerics,
so they decided to use a 7-bit code.
However, ASCII was designed to include a useful 6-bit subset,
which could only fit a single alphabet.

The consideration of a 6-bit, 64-character graphic subset was important to
the standards committee. If the ultimate decision was that columns 6 and 7
would be for graphics, then columns 2 through 7 would contain Space, 94
graphics, and Delete. But, even with the code providing 94 graphics, a
major assumption of the standards committee was that data processing
applications would, for the foreseeable future, be satisfied with a
monocase alphabet (that is, a 64- or less graphic subset) as they had in
the past---that 64-character printers would predominate. So it was
important to be able to derive a 64-character, monocase alphabet, graphic
subset from the code by simple, not complex, logic.

—Charles E. Mackenzie, "Coded character sets: history and development"
(1980), p.228

In fact, the some of the committee members
wanted to reserve the remaining space for control characters.

The conclusion of the preceding paragraph is based on the assumption
that two alphabets, small letters and capital letters, would be included
in the 7-bit code and that decision had not yet been made. If the
decision was ultimately made that columns 6 and 7 would would contain
controls, then small letters would not be included in the 7-bit code. *

* If the committee did decide for controls in columns 6 and 7, it is
still likely that they would have wanted an alphabet of small letters to be
provided. Presumably, the small letter alphabet would then have been
provided by a caseshift approach.

—Ibid, p.232

Though the comittee first formed in 1961,
it wasn't until late 1963
that they finally agreed to include a lowercase alphabet,
largely because of the influence of the
International Telegraph and Telephone Consultative Committee (CCITT).

At the first meeting of ISO/TC97/SC2 in 1963 October 29-31, a resolution
was passed that the lower-case alphabet should be assigned to
columns 6 and 7.

—Ibid, p. 246

The ISO proposal, though, did not include the lower case alphabet and the
five accent marks that the CCITT considered essential.

—Eric Fisher, "The Evolution of Character Codes, 1874-1968", p.22

Why is it useful for filenames to include upper and lowercase?

It can make filenames more intelligible,
such as distinguishing between
the abbreviation for United State ("US")
and the first-person plural objective pronoun ("us")
in paths such as /usr/share/X11/locale/en_US.UTF-8/.

It also allows more possibilities for filenames,
and makes filename comparisons simpler and faster
because they don't have to occasionally convert
to uppercase or lowercase.

Bear in mind that it's MUCH more work for a filesystem to be
case-insensitive than -sensitive. A filesystem is case-sensitive by
default, in the simplest case; it can only be made case-INsensitive through
a lot of extra engineering. In UNIX, all the system has to do is sort on
the ASCII values of the first letters of the filenames. In the Mac OS and
Windows, the filesystem has to be smart enough to create synonyms of
various letters — A for a, and so on — and sort accordingly. That takes a
LOT of code. It's a testament to the completeness of the original Mac OS
that in 1984 this was all handled properly, before Windows even brought
lower-case letters to the PC side.

http://xahlee.info/UnixResource_dir/_/fileCaseSens.html

However, there is also no shortage of opinions
that enforcing filename case-sensitivity
-- and even case-sensitivity in general --
was a bad decision. [5]

There are also passionate views
to the opposite effect. [6]

Laying aside that argument for the moment,
why did Windows filenames end up case-insensitive?

Strictly speaking, modern Windows filenames could be case-sensitive,
but they aren't
because the Windows API for opening files is not case-sensitive,
i.e. the default call to CreateFile
does not enable the FILE_FLAG_POSIX_SEMANTICS option.

However, Windows' own NTFS filesystem is case-preserving.
This means that it is possible to mount an NTFS partition with Linux
and make a file called "Myfile.txt" in the same directory as "MYFILE.TXT",
but it will not be possible to read or modify both of those files,
at least not with standard Windows software.

This API behavior exists to maintain compatibility with MS-DOS filesystems. [7]
MS-DOS was built on Tim Paterson's 86-DOS (released in 1980)
and Marc McDonald's FAT filesystem,
which were designed for compatibility with CP/M. [8] [9]
CP/M was created in 1973 by Gary Kildall,
and also used case-insensitive filenames. [12]

Lower case ASCII alphabetics are internally translated to upper
case to be consistent with CP/M file and device name conventions.

http://www.gaby.de/cpm/manuals/archive/cpm22htm/ch1.htm

The CP/M manual does not state explicitly why it uses these conventions,
but Gary Kildall wrote CP/M on a DEC PDP-10 mainframe
running the TOPS-10 operating system
when he was working at Intel. [10]
Consequently, there are many similarities between CP/M and TOPS-10,
including filename case-insensitivity.

(It should be noted that CP/M has also been compared to RT-11,
a DEC operating system for the PDP-11 minicomputer
that is closely related to TOPS-10, [11]
although the influence may not have been as direct.)

Why did TOPS-10 use case-insensitive names?
Because the DEC SIXBIT encoding used for filenames
was optimized for its architecture.

RAD50 was used in FILES-11 and RT-11 disks. It was used to store 3
characters in a 16 bit word. SIXBIT was used on TOPS-10 36bit systems to
store 6 characters in a word. It also allowed for a fast file name search
since the names were all on word boundaries (full filename compair took 2
compair, and 1 mask operation 6+3 file names).

https://lkml.org/lkml/2002/7/30/257

(CP/M was written for an eight-bit architecture,
which is presumably why it used an 8.3 filename
instead of a 6.3 filename.) [13]

Similarly, the RT-11 didn't use ASCII for filenames,
but rather an encoding called RADIX-50,
which helped to save memory. [14]

Neither of these encodings are used much anymore,
but their case-insensitivity,
a useful optimization on 1970s hardware,
endures to this day.

The lack of agreement on filename case-sensitivity may seem insignificant,
but it has caused persistent difficulties
in cross-platform development. [15] [16] [17]

Developers of cross-platform software try to avoid making assumptions about filename case-sensitivity,
but problems of this ilk crop up
when porting from Windows to Linux or vice-versa. [18]

For example, the Linux port of the Unity engine has issues with case-sensitive filesystems.

Unity does not properly run on a case-sensitive file system (which is
something that Unity users have discovered if they’ve tried to install and
run Unity on a case-sensitive HFS+ file system). This is primarily due to
Unity’s asset database, and how it stores paths to map them to GUID values.
Of course we tried to be smart in the early days, but if you don’t set up a
way to actually verify that what you’re doing works on a case-sensitive
file system, then it will never fail that some well-intentioned programmer
throws a toLower() in somewhere and ruins the party.

[1]

Everything in Multics is case sensitive; Multics permits use of the full
upper and lower case ASCII character set.

Multics command names and programming languages use lowercase by
convention, but users are free to use uppercase letters in path names,
identifiers, user names, etc.

http://www.multicians.org/mgc.html#commandlanguage

Multics was one of the first systems to use upper and lower case letters
freely.

http://www.multicians.org/mga.html#ASCII

Obviously, BCD had no lower-case characters, and Multics did not use BCD
at all, except to output log and crash and tape mount messages from ring
0 to the primitive Selectric operator's console.

http://www.multicians.org/mgb.html#BCD

Since the Multics file system distinguished between upper and lower case,
external names had to be case sensitive, and without much discussion we
chose to have all variable names be case sensitive.

http://www.multicians.org/pl1.html

[2]

See p. 9 of "The Evolution of Character Codes, 1874-1968", Eric Fisher.

http://trafficways.org/ascii/ascii.pdf

https://github.com/ericfischer/ascii

[3]

Simple pattern of correspondence should exist between codes assigned to
upper and lower case alphabetic characters.

—R. W. Bemer

From page 20 of "A proposal for a generalized card code for 256 characters",
Communications of the ACM, Volume 2 Issue 9, Sept. 1959.

http://dx.doi.org/10.1145/368424.368435

[4]
From "Coded character sets: history and development" by Charles E. Mackenzie, 1980.

[5]

Mac ＆ Windows users have to have filenames read to them over the phone
by support techs. They have to be able to write little sticky notes to
their mothers about how to open up the mail program, without worrying
about how the filenames are capitalized. Haven't you ever fumed over a
URL with initial-caps in the folder names in the path, having to fiddle
with capitalization until you get a response that's anything but a 404?
Haven't you ever been secretly pleased that e-mail addresses aren't
case-sensitive?

—Brian Tiemann, On Unix File System's Case Sensitivity (2001)

http://xahlee.info/UnixResource_dir/_/fileCaseSens.html

Anecdotally, case sensitivity in programs is known to be error-prone for
both beginners and experienced users. Bob Frankston, a Multics alumnus
and the co-inventor of VisiCalc, once said it was the biggest mistake
that Multics had inflicted on the world.

—Stavros Macrakis (2003)

https://www.ma.utexas.edu/pipermail/maxima/2003/004483.html

One of the most pernicious problems with C-based languages is that
they're case-sensitive. While this decision may have made sense in 1972
when the language was created, one wonders why the sins of Kernighan and
Ritchie have been blindly perpetuated for the last thirty-three years.

[ . . . ]

Unless you have extremely compelling reasons to make something
case-sensitive, case insensitivity is a much more human being friendly
design choice. Designing software that's easier for machines is
questionable at best.

—Jeff Atwood, The Case For Case Insensitivity (2005)

http://blog.codinghorror.com/the-case-for-case-insensitivity/

There is no longer any excuse for making humans learn and handle the
quirks of the way computers store upper- and lower-case characters.
Instead, software should handle the quirks of human language.

—Brian Hauer, Case-sensitivity is the past trolling us (2014)

http://tiamat.tsotech.com/case-sensitivity-sucks

Since it appears to have manifested out of opinion rather than
necessity, it could be said case-sensitivity is the worst way that
modern technology sucks.

—Greg Raiz (2007)

http://www.raizlabs.com/graiz/2007/02/11/linuxunix-case-sensitivity/

This is really stupid, it causes a ton of problems and there is no
longer any good reason to have case sensitivity in an OS.

—Julian, OddThinking (2005)

http://www.somethinkodd.com/oddthinking/2005/10/27/the-case-for-case-preserving-case-insensitivity/

[6]

Many of us consider those filesystems which cannot preserve case, but
which accept "input" in random case, to be so utterly broken as to be
undeserving of any attention whatsoever. They create a situation where
the computer effectively considers the users to be too stupid or blind
or whatever to be able to say what we mean accurately.

—Greg A. Woods (2003)

https://lists.nongnu.org/archive/html/info-cvs/2003-11/msg00127.html

Why oh why on Earth engineers at Microsoft decided to make Windows case
insensitve [sic] and then use camel case anyway, wherever possible?

It makes case-sensitive systems and their sysadmins cry :-(

—u/bwosc (2015)

https://www.reddit.com/r/sysadmin/comments/2w6c8g/case_insensitive_windows_rant/

Why are computer file names and conventions and protocols so messed up?
It's bizarre -- and Microsoft has been one of the worst offenders with
one of the most powerful positions and opportunities to make it a better
filename-naming world.

[ . . . ]

And, Microsoft dares to allow mixed case naming, but does case
insensitive handling of file names... don't even get me started about
some of the bizarre results and buggy behavior I've traced to that. I
only wish I'd had a chargeback code for all of the time I've spent
fixing and debugging systems that all come back to the file naming.
Sigh, again.

—yagu (2006)

http://slashdot.org/comments.pl?sid=190747&cid=15690704

The old DOS/Mac people thought case insensitivity was a "helpful"
idea, and that was understandable - but wrong - even back in the 80's.
They are still living with the end result of that horrendously bad
decision decades later. They've _tried_ to fix their bad decisions,
and have never been able to (except, apparently, in iOS where somebody
finally had a glimmer of a clue).

—Linux Torvalds (2018)

https://patchwork.kernel.org/cover/10717177/

[7]

Do not assume case sensitivity. For example, consider the names OSCAR,
Oscar, and oscar to be the same, even though some file systems (such as
a POSIX-compliant file system) may consider them as different. Note that
NTFS supports POSIX semantics for case sensitivity but this is not the
default behavior.

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx

[8]

Every operating system has basic functions like reading and writing disk
files. The API defines the exact details of how to make it happen and
what the results are. For example, to “open” a file in preparation for
reading or writing, the application would pass the location of an
11-character file name and the function code 15 to CP/M through the
“Call 5” mechanism. The very same sequence would also open a file in
DOS, while, say, UNIX, did not use function code 15, 11-character file
names, or “Call 5” to open a file.

—Tim Paterson (2007)

http://dosmandrivel.blogspot.com/2007/08/is-dos-rip-off-of-cpm.html

As I noted when I discussed the old MS-DOS wildcard matching rules,
MS-DOS worked hard at being compatible with CP/M. And CP/M used 8.3
filenames.

—Raymond Chen (2009)

https://blogs.msdn.microsoft.com/oldnewthing/20090610-00/?p=17953/

[9]

The FAT file system 's restrictions on naming files and directories are
inherited from CP/M. When Paterson was writing 86-DOS one of his primary
objectives was to make programs easy to port from CP/M to his new
operating system. He therefore adopted CP/M's limits on filenames and
extensions so the critical fields of 86-DOS File Control Blocks (FCBs)
would look almost exactly like those of CP/M. The sizes of the FCB
filename and extension fields were also propagated into the structure of
disk directory entries

http://spider.seds.org/spider/OS2/HPFS/fat.html

[10]

Gary Kildall developed CP/M on a DEC PDP-10 minicomputer running the
TOPS-10 operating system. Not surprisingly, most CP/M commands and file
naming conventions look and operate like their TOPS-10-counterparts. It
wasn’t pretty, but it did the job.

—Robert X. Cringely, Accidental Empires, Chapter 4 — Amateur Hour

http://www.cringely.com/2013/02/18/accidental-empires-chapter-4-amateur-hour/

CP/M and ISIS in operation have some general similarities to interactive
operating systems on minicomputers and mainframes such as the DEC PDP-10
"TOPS-10" OS. Kildall used such systems to develop and run his
cross-assemblers and compilers, which became Intel products; and later
to develop his own products which ran "native" on CP/M systems.

—Herbert R. Johnson, CP/M and Digital Research Inc. (DRI) History

http://www.retrotechnology.com/dri/d_dri_history.html

Kildall said that PL/M was ‘‘the base for CP/M,’’ even though the
commands were clearly derived from Digital’s, not IBM’s software. For
example, specifying the drive in use by a letter; giving file names a
period and three-character extension; and using the DIR (Directory)
command, PIP, and DDT were DEC features carried over without change. [100]

[ . . . ]

99. Gary Kildall, ‘‘CP/M: A Family of 8- and 16-Bit Operating Systems,’’
Byte, (June 1981): 216–229. Because of the differences between DEC
minicomputers and the 8080 microprocessor, the actual code of CP/M was
different and wholly original, even if the syntax and vocabulary were
similar.

100. The above argument is based on PDP-10 and CP/M manuals in the
author’s possession, as well as conversations with Kip Crosby, to whom I
am grateful for posting this question over an Internet discussion forum.

—Paul E. Ceruzzi, page 238 of "A History of Modern Computing", 2nd. edition published 2003 by MIT Press.

[11]

From a post on the comp.sys.tandy Usenet group:

Of course, CP/M itself is an almost exact knock off of DECs PDP-11 OS,
RT-11, an operating system that dates back to the early seventies, and
RT-11 shows its roots in TOPS-10, which goes back another year or two.
For some reason, all the historians tracing the source of MS-DOS
mysteriously stop at CP/M, even when command sets and utility syntaxes
are compared side-by-side. Who had a PIP utility first? Why, DEC, not
Digital Research.

The joke in the seventies that "Digital Research" was a typographical
error and the companies real name was "Digital [Equipment Corporation]
Rehashed", for RT-11, TOPS-10 and RSTS/E all predated CP/M by a lot and
yet have the same command syntax.

https://groups.google.com/forum/#!msg/comp.sys.tandy/EcfhcRv9gEU/fNu_h9fCe3AJ

From a post on the alt.folklore.computers Usenet group:

Maybe we do need Kildall for the next step, but when I saw CP/M
version 1 it appeared closest to a dialect of RT-11, so I've always
figured that RT-11 was the closest ancestor. After that, it began
to drift. If I recall correctly, V1's prompt was the DECcish ".",
but in V2 it became "> ". Therefore, it would appear that MS-DOS
got its start from CP/M V2. It's a pity MS-DOS didn't start from
RT-11, which had multitasking, interrupt driven I/O, and all the
other good stuff that is easy to fit in a well designed 8KB kernel.

https://groups.google.com/forum/#!topic/alt.folklore.computers/BxRlG1tYv8o

Gary Kildall's CP/M started out as his own reimplementation of RT-11 for
the Intel 8080.

http://blu.org/mhonarc/discuss/2011/10/msg00112.php

[12]

CP/M did this conversion internally.

It should also be noted that all alphabetic lower case letters in file
and drive names are always translated to upper case when they are
processed by the CCP [Console Command Processor].

[ . . . ]

Further, recall that the CCP always translates lower case characters to
upper case characters internally. Thus, lower case alphabetics are
treated as if they are upper case in command names and file references

https://archive.org/stream/Intro_to_CPM_Feat_and_Facilities/Intro_to_CPM_Feat_and_Facilities_djvu.txt

[13]

As for the 8.3, look at the format of a CP/M directory entry. 16
bytes so they fill a disk block, not RAD50, 8 bytes for name, 3 for
extension, and I forget the rest, but it includes pointers to the
data.

https://groups.google.com/d/msg/alt.folklore.computers/fqXomGO4I1I/ub_hJ2WxXHwJ

[14]

... files were located via the directory, which resided in a fixed
location at the beginning of the hard drive. The directory consisted of
a single array of entries, each with a 6.3 character file name formatted
in DEC’s Radix-50 format. A file’s directory entry indicated the address
of the first block of the file.

http://cryptosmith.com/2013/10/19/digitals-rt-11-file-system/

RADIX50 is a character coding system used in earlier Digital Equipment
Corporation computers, such as the PDP-10, DECsystem-10 and
DECsystem-20. It was implemented as a way to pack as many characters
into as few bits as possible.

RADIX50 actually contains 40 codes, or 50 in octal. Because this is not
a power of two, the PDP-10 processor had instructions to pack several
RADIX-50 words into a single 36-bit word or extract RADIX-50 words from
a 36-bit word.

http://nemesis.lonestar.org/reference/telecom/codes/radix50.html

[16]

One problem is that the file-system NTFS, that is used by most modern
Windows Versions, is (by default) only case-preserving (hello.c and
Hello.C are the same file, when in the same folder). The
OpenFOAM-sources need a fully case-sensitive file-system and can't even
be unpacked properly on a Windows system (see [2]).

http://openfoamwiki.net/index.php/Main_FAQ#Why_isn.27t_there_a_Windows_port_of_OpenFOAM_.3F

[17]

Issues of alphabetic case in pathnames are a major source of problems.
In some file systems, the customary case is lowercase, in some
uppercase, in some mixed. Some file systems are case-sensitive (that is,
they treat FOO and foo as different file names) and others are not.

https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node205.html

The main difficulty in dealing with names of files is that different
file systems have different naming formats for files.

https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node203.html

[18]

http://adrienb.fr/blog/wp-content/uploads/2013/04/PortingSourceToLinux.pdf

Linux filesystems are case-sensitive
Windows is not
Not a big issue for deployment (because everyone ships packs of some
sort)
But an issue during development, with loose files
Solution 1: Slam all assets to lower case, including directories, then
tolower all file lookups (only adjust below root)
Solution 2: Build file cache, look for similarly named files

In Linux and other Unix-derived operating systems,
the only characters that may not appear
in the name of a file or directory [21]
are the slash /
(which is used to delimit paths)
and the ASCII null \0
(which is used to terminate strings in C). [22]

Windows has the same restrictions,
as well as many other restrictions which are considerably more complex
and are partly the result
of backwards compatibility with CP/M pseudofiles.

This has had long-term consequences,
such as imposing some surprising restrictions on URLs
in Microsoft's web application framework, ASP.net
(these were relaxed in a later version).

Windows also does not permit filenames to contain colons,
due to their use in delimiting drive names like C:\.
This causes issues in sharing files across platforms.

For example, a UNIX file name can use a colon (:), but a Windows file name
cannot use a colon (:). If a UNIX user attempts to create a file with a
Windows illegal character on a Windows Services for UNIX network file
system (NFS) share, the attempt is unsuccessful and the UNIX client
computer receives an input or output error.

https://support.microsoft.com/en-us/kb/289627

This also makes filenames containing timestamps somewhat inconvenient.
Since filenames cannot contain colons,
an ISO 8601 timestamp such as 1970-01-01T00:00:00Z
cannot be part of a valid filename.
Windows software uses various workarounds,
such as removing the colon entirely
or replacing it with a similar-looking Unicode character. [19]

(It should be acknowledged that on Linux
the names of directories in $PATH cannot contain colons either, [20]
but such restrictions do not apply to filenames.)

[21]

As discussed in this StackOverflow question:

https://stackoverflow.com/questions/1976007/what-characters-are-forbidden-in-windows-and-linux-directory-names

When Steve Bourne was writing his Unix shell (which came to be known as
the Bourne shell), he made a directory of 254 files with one-character
names, one for each byte value except '\0' and slash, the two
characters that cannot appear in Unix file names. He used that directory
for all manner of tests of pattern-matching and tok- enization. (The
test directory was of course created by a program.) For years
afterwards, that directory was the bane of file-tree-walking programs;
it tested them to destruction.

—Brian W. Kernighan and Rob Pike, "The Practice of Programming",
Chapter 6: Testing, p. 158

https://books.google.com/books?id=j9T6AgAAQBAJ&lpg=PP1&dq=the%20practice%20of%20programming&pg=PA158#v=onepage&q=When%20Steve%20Bourne

This is also explicitly stated in the POSIX standard.

The characters composing the name may be selected from the set of all
character values excluding the slash character and the null byte.

http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html

The bytes composing the name shall not contain the or
characters.

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_170

[22]

The wisdom of this decision is a matter of some debate.

Dennis Ritchie has explained the rationale for using a null-terminator:

In BCPL, the first packed byte contains the number of characters in the
string; in B, there is no count and strings are terminated by a special
character, which B spelled `*e'. This change was made partially to
avoid the limitation on the length of a string caused by holding the
count in an 8- or 9-bit slot, and partly because maintaining the count
seemed, in our experience, less convenient than using a terminator.

Null-terminated strings do have some drawbacks,
such as making certain optimizations more difficult,
and exposing unwary programs to buffer overflow bugs.

On the other hand, length-prefixed strings such as those in Pascal
tend to have their own difficulties,
such as storing strings of arbitrary length.

There is extensive discussion here:

https://utcc.utoronto.ca/~cks/space/blog/programming/CNullStringsDefense

In any case, both Linux and Windows use null-terminated strings,
as do other modern operating systems.

Windows has built-in support for its own NTFS filesystem,
UDF (used for some CDs and DVDs),
and the legacy FAT16/FAT32/exFAT family.
All other filesystems require installation of third-party software.

Linux has drivers for almost all file systems
that can be legally mounted without paying royalties,
including ones that don't see much use nowadays, like Amiga file systems.
It can also mount FAT and NTFS filesystems,
despite Microsoft's lucrative patent licensing deals
and ongoing litigation against Android manufacturers
and other companies that use the Linux kernel's FAT drivers.

For the system partition,
Linux users can choose among the usual ext3 journaling filesystem
or up-and-coming filesystems like Btrfs.
Unlike FAT and NTFS filesystems,
ext3 and Btrfs do not require defragmentation to maintain good performance.
(Realistically, though,
defragmentation isn't that important for NTFS, either.)

Finally, Linux permits unprivileged users
to run their own filesystems via FUSE.
This has many practical benefits,
such as accessing cloud storage as if it were an ordinary directory.

There is a project to bring FUSE to Windows,
but it is no longer maintained
and its various forks are not as mature as the Linux implementation.

UTF-8 has many practical advantages over UTF-16.

It is a superset of ASCII,
so it is backwards-compatible with existing text files.

Zero bytes do not appear at any point in a valid UTF-8 representation,
so strcpy() still works.
It is self-synchronizing, i.e. it is possible to resynchronize
after a lost or corrupted code point without re-reading the entire string.
It is more portable because it does not require a byte-order mark
and is less likely to be mistaken for other encodings.
Internet Explorer has been known to have security issues with UTF-16.

If the Windows API were designed today,
it would probably use UTF-8.
The Unicode Consortium primarily recommends UTF-16
for compatibility with Java and the Windows API.

In principle, UTF-16 would have the advantage
of constant time addressing of single characters,
but in practice most programming languages do not provide data types for this,
with the exception of Go and rust.

On Windows, the file extension is the sole determiner
of what program runs when opening a given filetype.
This makes it easier to dupe a Windows user
into unintentionally running malware.

Also, if the file extensions for different filetypes happen to collide,
as they inevitably do,
one program must take default precedence over the other for that file extension.

For example, there a lot of different file formats with a .dat file extension,
but only one application gets to open them by default.

On Linux, filetypes are determined by a combination of
filesystem metadata (e.g. execute permissions),
heuristics based on file signatures (a.k.a "magic numbers"),
and .desktop configuration files with mimetype information
(which includes file extensions).

A file's executable status is separate from its file extension,
and an executable text file written in a scripting language
can control how it is run via the first-line shebang convention,
e.g. #!/usr/bin/env python3 -i.

Windows does not support shebang lines,
but languages that emphasize cross-platform compatibility,
such as Python,
have implemented work-arounds.

Permissions are a big topic in multi-user computing,
and both Linux and Windows have adapted over time,
each with various advantages and disadvantages. [23] [24]

However, here is a specific example
of a relatively simple, single-user permissions feature:
it is sometimes desirable to set old files as read-only,
so that they are still easily accessible
(i.e. not compressed in a .zip file),
but are less likely to be accidentally deleted, moved, or modified.

On Windows, the content of a read-only file cannot be altered,
but the file itself can be moved, renamed, or deleted,
because the folder it is in cannot have a read-only status.

In Linux, by contrast, a read-only directory cannot have files added to it,
and files in such a directory cannot be moved, renamed, or deleted
without first removing the read-only status from the directory they are in.
Modifications of the contents of the files
depend on the individual file permissions.

[23]
Unix permissions, for example, are not a panacea: https://unix.stackexchange.com/questions/164303/single-user-for-sharing-vs-multiple-users

[24]
NTFS permissions have their own issues, e.g. https://serverfault.com/questions/31709/how-to-workaround-the-ntfs-move-copy-design-flaw

These are limitations of the Windows platform
which are not intrinsic to the operating system,
but are the result of default behavior
or restrictions on the Windows ecosystem.

When accessing external volumes such as flash drives,
Windows assigns different capital letters to each volume,
each letter corresponding to a different absolute path root.
This is necessary for backwards compatibility with MS-DOS,
but it is not without drawbacks.

Perhaps the most obvious problem
is that there are only 26 letters in the English alphabet.
But what does this mean in practice?

One consequence is that the assigned drive letter
may be different when a drive is reconnected.
This means that, for example,
applications that track recently used files
will look for files under the old drive letter,
and be unable to find the files.

I have a problem with Word when working with documents on my flash drive.
If I insert the drive days later and try to use the recently used file
list, Word sometimes says it can’t find the document.

I’ve worked out that when I insert the flash drive it’s not always using
the same drive letter – it’s F or G drive but occasionally even later in
the alphabet.

How can I change the flash drive letter or, even better, make it appear as
the same drive letter each time?

http://new.office-watch.com/2008/make-a-consistent-drive-letter-or-path-to-a-removable-drive/

Fortunately, there is a solution: NTFS mount points.

Volume mount points are robust against system changes that occur when
devices are added or removed from a computer.

https://technet.microsoft.com/en-us/library/Cc938934.aspx

If you're running out of drive letters, one trick is to use a mount point
for each logical drive that you are going to bring into Windows; this way,
performance can be contained to a logical drive and still conform to your
drive letter standards.

[ . . . ]

There are many scenarios in which you would want a large number of drives,
such as multiple databases for Microsoft SQL Server or Exchange Server
installations. Exchange databases are notorious for needing their own
drives per mailbox store and, if you provision out well, you will quickly
run out of drive letters.

—Rick Vanover

http://www.techrepublic.com/blog/the-enterprise-cloud/use-mount-points-if-you-run-out-of-windows-drive-letters/

Unfortunately, Windows doesn't use mount points by default
for external hard drives or flash drives,
possibly because mount points behave slightly differently than drive letters.

The problem is the recycle bin. This "undo" option is maintained with a
hidden system file that is on the partition that holds the files being
deleted. Unfortuantely, when the command to delete a folder is given, the
system attempts to delete the folder using the mount point folder's Master
File Table, and not the subfolder's Master File Table. The mount point
folder's MFT doesn't host the record, and an access denied message is
kicked back to you for having the temerity to try and recycle a directory
which apparently doesn't even exist! The only solution for this is to not
recycle subfolders and directories, but to outright delete them.

http://getyouriton.blogspot.com/2009/08/serious-gotchas-with-mounted-drives-or.html

While NTFS filesystems have a root directory,
Windows has no unique root directory;
instead, each drive has its own root.

https://stackoverflow.com/questions/151860/root-folder-equivalent-in-windows

My Computer roughly corresponds to a root directory in concept,
and looks like a folder when viewed in Windows Explorer,
but there is no My Computer folder anywhere on the filesystem.
Instead, My Computer is a virtual folder.

Unlike file system folders, users cannot create new virtual folders
themselves. They can only install ones created by non-Mic
https://github.com/nbeaver/why-linux-is-better

← Older Post Newer Post →