Bash – BSD – Remove non-ascii characters from all files in a directory recursively


I'm trying to migrate a bunch (300GB+) of files from a FAT32 drive to my freeNas ZFS filesystem but every command I throw at it (tar,pax,mv,cp) throws an 'invalid argument' when it encounters a non-ASCII filename – it's usually something that's been created under Windows and it reads something along the lines of "foo?s bar.mp3…" where the ? may have been an apostrophe or such.

Can anyone help with a few lines of code to recursively go through the directory tree and rename files to remove the offending characters.

Much appreciated.

Best Answer

Try mounting the filesystem with the iocharset option set to the encoding it uses.

From man mount under the "Mount options for fat" section:

          Character set to use for converting between 8 bit characters and
          16 bit Unicode characters. The default is iso8859-1.  Long file‐
          names are stored on disk in Unicode format.

See also under the "Mount options for vfat" section:

          Translate  unhandled  Unicode  characters  to  special   escaped
          sequences.   This lets you backup and restore filenames that are
          created with any Unicode characters. Without this option, a  '?'
          is used when no translation is possible. The escape character is
          ':' because it is otherwise illegal on the vfat filesystem.  The
          escape  sequence  that gets used, where u is the unicode charac‐
          ter, is: ':', (u & 0x3f), ((u>>6) & 0x3f), (u>>12).


   utf8   UTF8  is  the  filesystem safe 8-bit encoding of Unicode that is
          used by the console. It can be be  enabled  for  the  filesystem
          with this option or disabled with utf8=0, utf8=no or utf8=false.
          If `uni_xlate' gets set, UTF8 gets disabled.


I'm sorry, that was Linux, this is for BSD (from man mount_msdosfs:

 -L locale
     Specify locale name used for file name conversions for DOS and
     Win'95 names.  By default ISO 8859-1 assumed as local character

 -D DOS_codepage
     Specify the MS-DOS code page (aka IBM/OEM code page) name used
     for file name conversions for DOS names.