R – Will string.Trim() ever remove *valid* characters from a filename

filenamesfilesystemsinvalid-charactersnetwhitespace

I'm creating a class to store a filename. To do so, I need to know exactly which characters are invalid and exactly which characters are invalid as leading/trailing characters.

Windows Explorer trims leading and trailing white-space characters automatically when naming a file, so I need to trim the same characters when constructing a filename instance.

I thought about using string.Trim(), but it would be naive to assume the default set of characters it trims coincides exactly with the invalid leading/trailing filename characters of the OS.

Documentation for string.Trim() says that it trims the following characters by default:
U+0009, U+000A, U+000B, U+000C, U+000D, U+0020, U+0085, U+00A0, U+1680, U+2000, U+2001, U+2002, U+2003, U+2004, U+2005, U+2006, U+2007, U+2008, U+2009, U+200A, U+200B, U+2028, U+2029, U+3000, U+FEFF

Unfortunately, some of the above characters are NOT invalid in a file, because they aren't in the character set returned by System.IO.Path.GetInvalidFileNameChars.

Am I then correct that string.Trim() could potentially remove VALID leading/trailing characters from a filename, therefore corrupting the filename?

What exactly are the invalid leading/trailing characters for a filename in the Windows Vista OS? I understand that they are not necessarily the same as the file system itself, since the OS can run on different file systems.

Best Answer

Am I then correct that string.Trim() could potentially remove VALID leading/trailing characters from a filename, therefore corrupting the filename?

Yes. Even more so on a UNIX-like system, where ' X' is a valid filename and distinct from ' x '