Web-server – How to serve HTTP filenames with special characters

encodinghttpi18nutf-8web-server

Take the following blog page as a case:

http://www.roney.com.br/2010/06/20/estados-do-brasil-um-pais-que-precisa-se-unir/

!Careful, it has tons of youtubes embedded, so is a slow load! It is a Brazilian web page, written in Portuguese but hosted (according to the blog's owner) on a USA webhost.

Of interest are the "PronĂºncia" links where they link to file names containing non ascii characters. Look at the second one (for ParĂ¡): the link as I write is to www.roney.com.br/wp-content/uploads/2010/06/par%E1.mp3 (unless he changes it out from under me in the future :)!))

As you see he has it coded, but what you don't know is what he actually named it on his file system or what system config they have.

If I click it in my Firefox browser I get their 404 page. He claims those links are working for Brazilian visitors. I thought this was a 100% server thing, i.e. either the server will serve it or it won't. Just for laughs I set the preferred language to Portuguese in my Firefox but as I suspected, it didn't make any difference.

Anyone care to offer any insight on how this might work in Brazil but not in USA or what things I would tweak on my own workstation so that they would serve for me too.

Best Answer

The problem lays in the URI encoding. Here it is encoded as iso-8859-1 (latin-1) (and then percent-encoded), but RFC 3986 states that it should be encoded as UTF-8 (and then percent-encoded).

Source:

More info about percent-encoding on wikipedia.

The actual RFC 3986.

Solution:

To give you an idea on how to solve this, you can do something like this in PHP.

<?php
echo urlencode(utf8_encode(urldecode('par%E1.mp3')));
?>

Note that if you put the whole URI, slashes (/) will be encoded also, making the URI invalid.

Related Topic