Magento 1.9 Character Encoding – Replace Special Characters in Catalog Image Names After Upload

character encodingmagento-1.9

Germans have some special characters like ä, ü, ö and ß. They aren't very search engine friendly, so I want to replace them in the file names of uploading catalog images. That sounds not that complicated, I know… but:

Responsible for actions like that is the method getCorrectFileName() in the class Varien_File_Uploader, which looks like this:

/**
 * Correct filename with special chars and spaces
 *
 * @param string $fileName
 * @return string
 */
public function getCorrectFileName($fileName)
{
    $fileName = preg_replace('/[^a-z0-9_\\-\\.]+/i', '_', $fileName);
    $fileInfo = pathinfo($fileName);

    if (preg_match('/^_+$/', $fileInfo['filename'])) {
        $fileName = 'file.' . $fileInfo['extension'];
    }
    return $fileName;
}

We could simply extend it with strtr() or str_replace() or even preg_replace() before the original preg_replace() which could looks like this:

$fileName = strtr($fileName, [
    'Ä' => 'Ae',
    'ä' => 'ae',
    'Ö' => 'Oe',
    'Ü' => 'Ue',
    'ü' => 'ue',
    'ß' => 'ss',
]);

But no matter which of these functions I use, it ignores the special characters in the file name. When I'm overriding $fileName at the beginning with a string like "täst-file.jpg" everything works fine..

My first idea was an encoding issue, so I tried to convert the string to UTF-8, but mb_detect_encoding($fileName, 'UTF-8', true) said it was already a valid UTF-8 encoded string and I ran out of ideas..

Does anyone know what I'm doing wrong or what a solution for this problem is?

Thanks in advance. 🙂

Best Answer

Okay, here's a solution...

All common ways to replace chars in a string like str_replace(), preg_replace(), strtr(), ... did NOT work in this special case.

I don't know exactly, which encoding (server, php, etc.) is responsible for not detecting the special chars, but converting the chars to ASCII finally work.

After splitting the filename via str_split(), I recognized that I got a sequence of 3 values for every german special char - "ä" (a, Ì, ˆ) or "Ü" (U, Ì, ˆ), so my solution may looks a bit weird:

public function getCorrectFileName($fileName)
{

    // @HACK REPLACE SPECIAL GERMANS CHARS WITH ITS INTERNATIONAL READABLE EQUIVALENT (ä -> ae)
    $fileInfo   = pathinfo($fileName);
    $split      = str_split($fileName);
    $splitCount = count($split);
    $fileName   = '';
    for ($i = 0; $i < $splitCount; $i++) {
        if (isset($split[$i]) == false || $split[$i] == null) {
            continue;
        } else if (in_array(ord($split[$i]), [65, 79, 85, 97, 111, 117]) && ord($split[$i+1]) == 204 && ord($split[$i+2]) == 136) {
            unset($split[$i+1]);
            unset($split[$i+2]);
            $fileName .= chr(ord($split[$i])) . 'e';
        } else if (ord($split[$i]) == 195 && ord($split[$i+1]) == 159) {
            unset($split[$i+1]);
            $fileName .= 'ss';
        } else {
            $fileName .= chr(ord($split[$i]));
        }
    }
    // @HACK END
    $fileName = preg_replace('/[^a-z0-9_\\-\\.]+/i', '_', $fileName);

    if (preg_match('/^_+$/', $fileInfo['filename'])) {
        $fileName = 'file.' . $fileInfo['extension'];
    }
    return $fileName;
}

I think this solution could also work for special chars in other languages. Just split the string and see via ord() which ASCII values you get...

I hope that helps.