Bash – How to manipulate accented files on Unix

bashunix

On our CMS that went through many versions and hosting platforms, we just came across an accented image file that would not work from it's url. So I ssh'd onto the box and tried to rename it.

However, I could not figure out how to type in it's name correctly. For instance, I know that if a file is called my file.txt you would do something like mv my\ file.txt my_new_file.txt but how do you move a file called café.txt?

In the end, I used a wildcard and did mv caf*.txt cafe.txt but I'm still wondering why this accented image would not work in the first place, and what would have been the proper way to handle it on unix.

Best Answer

Using bash:

Just to see my files:

$ ls
café.txt

Check the hex bytes of the file name (note: mine may be different... mine are probably UTF-8 encoded):

$ echo * | hexdump -C
00000000  63 61 66 c3 a9 2e 74 78  74 0a                    |caf...txt.|
0000000a

Then craft a file name using the hex codes for the parts that aren't found on your keyboard:

$ ls $'caf\xc3\xa9.txt'
café.txt

In bash, $' ... ' will expand escapes (much like "echo -e" does). And \x followed by a 2 digit hex code will replace it with that character.

And I don't see anything wrong with an:

ls caf*.txt

followed by a

mv caf*.txt cafe.txt

But if for some reason that would match multiple files, you can use the hex stuff:

ls $'caf\xc3\xa9.txt'
mv $'caf\xc3\xa9.txt' café.txt