Germans have some special characters like ä, ü, ö and ß. They aren't very search engine friendly, so I want to replace them in the file names of uploading catalog images. That sounds not that complicated, I know… but:
Responsible for actions like that is the method getCorrectFileName() in the class Varien_File_Uploader, which looks like this:
/**
* Correct filename with special chars and spaces
*
* @param string $fileName
* @return string
*/
public function getCorrectFileName($fileName)
{
$fileName = preg_replace('/[^a-z0-9_\\-\\.]+/i', '_', $fileName);
$fileInfo = pathinfo($fileName);
if (preg_match('/^_+$/', $fileInfo['filename'])) {
$fileName = 'file.' . $fileInfo['extension'];
}
return $fileName;
}
We could simply extend it with strtr()
or str_replace()
or even preg_replace()
before the original preg_replace()
which could looks like this:
$fileName = strtr($fileName, [
'Ä' => 'Ae',
'ä' => 'ae',
'Ö' => 'Oe',
'Ü' => 'Ue',
'ü' => 'ue',
'ß' => 'ss',
]);
But no matter which of these functions I use, it ignores the special characters in the file name. When I'm overriding $fileName at the beginning with a string like "täst-file.jpg" everything works fine..
My first idea was an encoding issue, so I tried to convert the string to UTF-8, but mb_detect_encoding($fileName, 'UTF-8', true)
said it was already a valid UTF-8 encoded string and I ran out of ideas..
Does anyone know what I'm doing wrong or what a solution for this problem is?
Thanks in advance. 🙂
Best Answer
Okay, here's a solution...
All common ways to replace chars in a string like
str_replace()
,preg_replace()
,strtr()
, ... did NOT work in this special case.I don't know exactly, which encoding (server, php, etc.) is responsible for not detecting the special chars, but converting the chars to ASCII finally work.
After splitting the filename via
str_split()
, I recognized that I got a sequence of 3 values for every german special char - "ä" (a, Ì, ˆ) or "Ü" (U, Ì, ˆ), so my solution may looks a bit weird:I think this solution could also work for special chars in other languages. Just split the string and see via
ord()
which ASCII values you get...I hope that helps.