Apache2: Problems matching accented characters in query string using RewriteCond & RewriteRule

apache-2.2mod-rewriteutf-8

Working on a site where the plan is to move URLs from a query string format to a number based format. Lots of URLs exist that have unescaped accented & similar UTF8 characters in them. The problem? I can’t seem to get Apache2 to properly match accented characters & do a rewrite. I am doing this all in the Apache2 config.

For example, this URL:

http://great.website.example.com/?place=cafe

Will work as expected with this Apache2 RewriteRule setting:

  RewriteCond %{QUERY_STRING} ^(place|location)=cafe
  RewriteRule ^/find/$ /find/1234? [L,R=301]

Now look at this URL. Note the accented é:

http://great.website.example.com/?place=café

Why doesn’t that URL work with the following Apache2 RewriteRule setting:

  RewriteCond %{QUERY_STRING} ^(place|location)=café
  RewriteRule ^/find/$ /find/1234? [L,R=301]

Both of these rules should rewrite the URL to the following:

http://great.website.example.com/find/1234

But the example with the accented é simply doesn’t work. Maybe a wildcard character would work, but I can’t seem to get that to work either.

Best Answer

You can use a RewriteMap to do the unescaping for you. like this:

RewriteMap unescape int:unescape

RewriteCond %{QUERY_STRING}  (location|place)=(.*)
RewriteCond ${unescape:%2}   café
RewriteRule ^/find/$         /find/1234? [L,R]

In the second RewriteCond line I use %2, as %1 would contain either "location" or "place".

However, adding a lot of RewriteRules to your config in order to map words to numbers is going to be a big performance hit on your server, and will be hard to maintain. A better solution is to use a RewriteMap for that too.

For example, asume that /etc/apache2/places.txt contains:

café    1234
shop   1235
...

Then this whould work for you:

RewriteMap unescape int:unescape
RewriteMap places   txt:/etc/apache2/places.txt

RewriteCond %{QUERY_STRING}  (location|place)=(.*)
RewriteCond ${unescape:%2}   (.*)
RewriteRule ^/find/$         /find/${places:%1}? [L,R]

You can also use a RewriteMap based on a database query. That would be my preferred choice, as I could then ofload the job of matching words to numbers to the content management system.

More details you can find in the documentation: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewritemap

Related Topic