Linux – filenames encoding problem when migrating a PHP Application from Windows Server 2003 to Linux

encodinglinuxmigrationPHPwindows

I have a couple of PHP applications actually running on Windows 2003 Server. As they are actually using PHP, Mysql and even Apache on Windows, the project is to move them to a new Linux server (Debian based).

But I got a problem with files uploaded by the users when 'special characters' (non-ASCII files, like éèàç) are used for file names (which is regular in French).

For example the file "accusé réception.pdf" is stored like:

$ ls
accus? r?ception.pdf

It seems there is no problem when I upload a new file on the Linux server: the file will be named like that on the fs but the application can find it. The problem is with the content migrated, the file is available but the application can't find it!

I wonder where the problem can come from:

  • filesystem table of characters/encoding, I think it comes from here
  • the php code of the applications itself, it would be a problem as I can't change it. I can file bug requests but I'm not sure when they'll be fixed.
  • another problem

And above all I need to find a way to fix that. As it only happens with migrated data, I could write a script or tune my fs/php/whatever to solve it when putting these applications in production on the Linux server.

Thanks in advance for your help.

Note: when the application can't find a file, my Apache logs are filled with 'readdir() expects parameter 1 to be resource, boolean given in …' errors

Best Answer

Windows usually uses unicode to encode non-ASCII characters, so if you're using a unicode-locale on your debian server you're set. It doesn't have to be french just because the characters you're trying to use are a french speciality (just tested this, I have my LANG set to en_US.UTF-8 and I can create a file with the name you mentioned ("accusé réception.pdf") and it shows up that way as well.

Chances are the accents are there, they just can't be displayed. To test this theory, replace that "ls" command with "LANG=en_US.UTF8 ls". If it shows up correctly it's just your terminal. Just set your LANG variable in your shell's startup file (eg. .bashrc) or system-wide in /etc/default/locale