POSIX file semantics in Windows
Some Background:
BTW: In the context of this post, ‘Windows’ always means ‘Windows NT’.
The Windows [[Kernel_(Computing)|kernel]] is designed to support multiple independent subsystems – different application environments – atop its [[Native_API|native API]]. The two most common examples of Windows subsystems are the [[Windows_API|Windows API]] – formerly known as Win32 and used by the overwhelming majority of Windows programs, and the [[POSIX]] subsystem ((Supposedly created because POSIX compliance was a checklist requirement for US government software purchases)) provided by [[Microsoft_Windows_Services_for_UNIX|SFU/SUA/Interix]] ((Originally Windows came with a [[Microsoft_POSIX_subsystem|basic POSIX subsystem]] out of the box, but that died with the release of Windows XP, replaced by Interix))- henceforth known as Interix.
Note: Just to muddy the waters, Cygwin provides a POSIX environment layered on top of the Windows API.
This has nothing to do with the POSIX subsystem.
The Problem
The Windows API is – by default – case preserving, but not case sensitive, everywhere. On the other hand, [[NTFS]] (the standard Windows filesystem) is POSIX compliant – presumably because the POSIX subsystem would have been unable to claim compliance otherwise. This means among other things that it supports case sensitivity – with the result that it’s possible to create two files which differ only in case, mightily confusing most Windows software. Even worse, POSIX allows file names to contain characters that the Windows API does not. Now you don’t need two similarly named files to create confusion; just having a single file named, say, ‘How Soon is Now?.flac’ will cause a problem.
So What?
Normally this wouldn’t be an issue. Coming from Unix it seems pretty annoying, but livable. A problem arises however when a filesystem is shared between Windows and an OS which will happily create files using the full range of names allowed by POSIX (in practice this means most operating systems other than Windows) .
Say some unsuspecting user (cough) is using [[NTFS-3G]] to access NTFS filesystems from Linux; by default it doesn’t restrict the filenames allowed to the Windows-safe subset.
Well, I have ((Or rather ‘had’ – it’s all been renamed for Windows-safety now)) a lot of music filed away such that the file is named after the title of the track, and a handful of tracks include characters which are verboten under Windows – most commonly a colon or a question mark. It took quite a while to figure out why a lot of my music seemed to be mysteriously inaccessible from Windows – and wouldn’t it be nice if there were some way to get to them, even if just to rename them? Well apparently “all the files are accessible if SFU is installed on Windows”. There’s only one problem with that: Interix/SFU isn’t available for [[Windows_XP_Professional_x64_Edition|the version of Windows I’m using]]. Sigh
What Next?
Clearly Windows itself is capable of dealing with the full range of filenames, otherwise Interix wouldn’t be able to use them. Is there any way for the Windows API to access this capability? Investigations reveal the existance of a promising-sounding flag to CreateFile: FILE_FLAG_POSIX_SEMANTICS. This flag’s description doesn’t agree with its name – the description only talks about case sensitivity. Is that an oversight in the description, or a poor choice of name? Further investigations require a small detour…
At some point Microsoft decided that the ability to have files whose names differ only in case could be dangerously confusing for some of the extant Windows software (think antiviruses trying to scan MALWARE.dll vs. malware.dll), so case sensitivity is disabled by default across the board. It can be enabled for subsystems other than the Windows API, and for the Windows API using FILE_FLAG_POSIX_SEMANTICS, by setting a registry key:
Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\kernel
, there needs to be a DWORD called ObCaseInsensitive
, with the value ‘0’.
…having set that key and rebooted, the FILE_FLAG_POSIX_SEMANTICS flag should now have an effect. A quick Python test confirms this:
f1=win32file.CreateFile('foo', win32file.GENERIC_WRITE, 0, None, win32file.CREATE_NEW, win32file.FILE_FLAG_POSIX_SEMANTICS, None) f2=win32file.CreateFile('FOO', win32file.GENERIC_WRITE, 0, None, win32file.CREATE_NEW, win32file.FILE_FLAG_POSIX_SEMANTICS, None) |
Both files are created successfully, and can be seen in Windows Explorer, though it can’t differentiate between them.
So, now we get to find out whether that flag really does allow POSIX semantics as its name suggests, or if the description is correct and all this was for nothing:
f3=win32file.CreateFile('foo?', win32file.GENERIC_WRITE, 0, None, win32file.CREATE_NEW, win32file.FILE_FLAG_POSIX_SEMANTICS, None) |
Result: error: (123, 'CreateFile', 'The filename, directory name, or volume label syntax is incorrect.')
Damn.
Update:
Cygwin 1.7 claims to support case sensitivity with the appropriate registry key set, and also the full range of characters in filenames. Since Cygwin does use the Native API for some of its operations, I dared to hope that perhaps they’d performed some black magic to get this to work properly. Sadly, attempting to read an existing file with a question mark in the name leads to the response ‘cannot access <filename>: No such file or directory
‘, so I guess they’re just faking it.