Filename encoding switched to UTF-8 by tar when untaring to Windows share

cifsencodingmounttar

We have different Magento installations (webshop) that allows images to be added to a product freely. When an image is added to a product, the file is named in a specific way that sometimes encorporates special characters (for instance German umlauts).

In the one case I'm currently looking into the filenames are encoded in latin1. I can see that by doing ls into a file, then reading the file via vim. Using the fileencoding=latin1, the umlauts are displayed correctly.

Now, these Magento installations are backed up by tar, 7zip and ccrypt (in that order). Unpacking those on linux gives the same filenames in the same encoding.

We now have a share on a Windows system where we would want to put the untarred Magento installation on. While untarring however, a number of error messages pop up in regard to the umlaut file names:

tar: var/magento_webs/customer/media/import/images/12063-sportsto\337d\344mpfer-hinten.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden
tar: var/magento_webs/customer/media/import/images/15240-kunststoffkotfl\374gel-detail-vorne.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden
tar: var/magento_webs/customer/media/import/images/14300-fl\374gel.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden
tar: var/magento_webs/customer/media/import/images/15240-41kotfl\374gel-kunststoff-vorne.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden
tar: var/magento_webs/customer/media/import/images/citr\366n.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden
tar: var/magento_webs/customer/media/import/images/2cv6-ma\337e-1.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden
tar: var/magento_webs/customer/media/import/images/2cv6-ma\337e.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden
tar: var/magento_webs/customer/media/import/images/11076-vorschalld\344mpfer.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden

(It roughly translates to Cannot execute open: File or directory not found)

Now, looking at the filenames tar lists, I can see that tar seems to try to create UTF-8 encoded filenames (\337 looks that way). However, the mount point has been made available via (from etc/fstab):

//192.168.0.111/share   /mnt/share      cifs    username=myusername,noperm,sec=ntlm,codepage=cp850       0       0

I'm not sure why these file names cannot be written to the share in a fashion that preserves the umlaut encoding. Am I missing another option (is codepage the wrong option for this)?

Edit 1: I can recreate something similar by SSHing into the linux box, setting the Remote character set of the connection to ISO8859-15, changing to the share directory and touching a file with an umlaut:

touch: kann â\244â nicht berÃŒhren: Datei oder Verzeichnis nicht gefunden

(Can't touch X: File or directory not found)

Edit 2: First try of a solution

I'e added iocharset=utf8 to the mount options, remounted the share, but got the exact same problems with the same files. Strangely enough, using mount (which usually prints all the options mount points have been mounted with) doesn't print the iocharset option (neither with utf8 nor with cp850 as the setting).

Best Answer

Some time in the past (I believe between version 2.0 or so), mount.cifs lost the "codepage=" option and put everything in the "iocharset=" option.

You should be fine with

//host/share /mnt/share cifs username=blah,noperm,sec=ntlm,iocharset=utf8 0 0