Apache encoding URI causing 404s

.htaccessapache-2.4mod-rewrite

This is a strange issue I'm having. I am using a framework that is called after mod_rewrite rewrites the URL. If a character in the URI is accented, as in the Latin alphabet, the request is never sent to my framework, but rather, the server errors out with a 404. I'm using a Windows machine, so not sure if this has something to do with it or not. As long as there are NO accented characters in the URI, then the request is sent to the framework without any issues. Can someone please tell me what's going on here and how to solve this?

EDIT: Here are 2 lines from my access.log. The first line shows the 404 where the accented Å was encoded by Apache and was not passed. When I change the Å in the URI to an English "A", everything works as expected.

127.0.0.1 - - [20/Oct/2017:13:50:50 -0400] "GET /actor/%C3%85ker HTTP/1.1" 404 222
127.0.0.1 - - [20/Oct/2017:13:54:48 -0400] "GET /actor/Aker HTTP/1.1" 200 5701

EDIT:
These are the lines in the .htaccess file

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .* index.php?request=$1 [L,QSA]

This is part of a debug log consisting of 2 people that shows that for one of them, mod_rewrite is not forwarding the request to my framework, yet the other is.

This is the one that fails:

[Fri Oct 20 17:52:48.119655 2017] [rewrite:trace3] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52846] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1fec230/initial] [perdir C:/Development/Apache24/htdocs/ansac/] add path info postfix: C:/Development/Apache24/htdocs/ansac/actor -> C:/Development/Apache24/htdocs/ansac/actor/\xc3\x85kerman, referer: http://ansac.com/
[Fri Oct 20 17:52:48.119655 2017] [rewrite:trace3] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52846] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1fec230/initial] [perdir C:/Development/Apache24/htdocs/ansac/] strip per-dir prefix: C:/Development/Apache24/htdocs/ansac/actor/\xc3\x85kerman -> actor/\xc3\x85kerman, referer: http://ansac.com/
[Fri Oct 20 17:52:48.119655 2017] [rewrite:trace3] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52846] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1fec230/initial] [perdir C:/Development/Apache24/htdocs/ansac/] applying pattern '^(.*)$' to uri 'actor/\xc3\x85kerman', referer: http://ansac.com/
[Fri Oct 20 17:52:48.119655 2017] [rewrite:trace1] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52846] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1fec230/initial] [perdir C:/Development/Apache24/htdocs/ansac/] pass through C:/Development/Apache24/htdocs/ansac/actor, referer: http://ansac.com/

Strangely, this one works:

[Fri Oct 20 17:56:42.415807 2017] [rewrite:trace3] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52856] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1ff8290/initial] [perdir C:/Development/Apache24/htdocs/ansac/] add path info postfix: C:/Development/Apache24/htdocs/ansac/actor -> C:/Development/Apache24/htdocs/ansac/actor/Gonz\xc3\xa1lez
[Fri Oct 20 17:56:42.415807 2017] [rewrite:trace3] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52856] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1ff8290/initial] [perdir C:/Development/Apache24/htdocs/ansac/] strip per-dir prefix: C:/Development/Apache24/htdocs/ansac/actor/Gonz\xc3\xa1lez -> actor/Gonz\xc3\xa1lez
[Fri Oct 20 17:56:42.415807 2017] [rewrite:trace3] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52856] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1ff8290/initial] [perdir C:/Development/Apache24/htdocs/ansac/] applying pattern '^(.*)$' to uri 'actor/Gonz\xc3\xa1lez'
[Fri Oct 20 17:56:42.415807 2017] [rewrite:trace4] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52856] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1ff8290/initial] [perdir C:/Development/Apache24/htdocs/ansac/] RewriteCond: input='C:/Development/Apache24/htdocs/ansac/actor' pattern='!-f' => matched
[Fri Oct 20 17:56:42.415807 2017] [rewrite:trace4] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52856] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1ff8290/initial] [perdir C:/Development/Apache24/htdocs/ansac/] RewriteCond: input='C:/Development/Apache24/htdocs/ansac/actor' pattern='!-d' => matched
[Fri Oct 20 17:56:42.415807 2017] [rewrite:trace2] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52856] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1ff8290/initial] [perdir C:/Development/Apache24/htdocs/ansac/] rewrite 'actor/Gonz\xc3\xa1lez' -> 'index.php?do=actor/Gonz\xc3\xa1lez'
[Fri Oct 20 17:56:42.415807 2017] [rewrite:trace3] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52856] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1ff8290/initial] split uri=index.php?do=actor/Gonz\xc3\xa1lez -> uri=index.php, args=do=actor/Gonz\xc3\xa1lez
[Fri Oct 20 17:56:42.415807 2017] [rewrite:trace3] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52856] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1ff8290/initial] [perdir C:/Development/Apache24/htdocs/ansac/] add per-dir prefix: index.php -> C:/Development/Apache24/htdocs/ansac/index.php
[Fri Oct 20 17:56:42.415807 2017] [rewrite:trace2] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52856] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1ff8290/initial] [perdir C:/Development/Apache24/htdocs/ansac/] strip document_root prefix: C:/Development/Apache24/htdocs/ansac/index.php -> /index.php
[Fri Oct 20 17:56:42.415807 2017] [rewrite:trace1] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52856] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1ff8290/initial] [perdir C:/Development/Apache24/htdocs/ansac/] internal redirect with /index.php [INTERNAL REDIRECT]
[Fri Oct 20 17:56:42.415807 2017] [rewrite:trace3] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52856] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1ffd7c0/initial/redir#1] [perdir C:/Development/Apache24/htdocs/ansac/] strip per-dir prefix: C:/Development/Apache24/htdocs/ansac/index.php -> index.php
[Fri Oct 20 17:56:42.415807 2017] [rewrite:trace3] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52856] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1ffd7c0/initial/redir#1] [perdir C:/Development/Apache24/htdocs/ansac/] applying pattern '^(.*)$' to uri 'index.php'
[Fri Oct 20 17:56:42.415807 2017] [rewrite:trace4] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52856] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1ffd7c0/initial/redir#1] [perdir C:/Development/Apache24/htdocs/ansac/] RewriteCond: input='C:/Development/Apache24/htdocs/ansac/index.php' pattern='!-f' => not-matched
[Fri Oct 20 17:56:42.415807 2017] [rewrite:trace1] [pid 1620:tid 792] mod_rewrite.c(480): [client 127.0.0.1:52856] 127.0.0.1 - - [ansac.com/sid#2de668][rid#1ffd7c0/initial/redir#1] [perdir C:/Development/Apache24/htdocs/ansac/] pass through C:/Development/Apache24/htdocs/ansac/index.php

Why does one work and not the other? Both names have accents in them, yet one fails.

Best Answer

A hypothesis...

RewriteRule .* index.php?request=$1 [L,QSA]

This line looks like an error, since the $1 backreference will always be empty as there is no captured group in the RewriteRule pattern.

The "framework" may still work (for the Latin alphabet) because it may be parsing the $_SERVER['REQUEST_URI'] PHP superglobal instead (as a fallback perhaps) - which many frameworks do. However, $_SERVER['REQUEST_URI'] will remain URL encoded (eg. /actor/%C3%85ker) - so this probably needs to be URL decoded (eg. /actor/Åker) before it can be routed through your framework. This might be where the problem is. A request like /actor/Aker, on the other hand, is the same whether it's URL encoded or not, so this would not affect URLs like this.

However, if your framework allows the requested URL to be overridden with the request URL parameter then consider changing the above directive to:

 RewriteRule (.*) index.php?request=$1 [L,QSA]

ie. enclose the RewriteRule pattern in parentheses.

This will result in the captured URL-path being passed in the request URL parameter. Now, the important difference with this is that the URL-path that the RewriteRule directive matches against is already URL decoded. So, the request URL parameter already contains the URL-decoded request (albeit less the slash prefix), eg. actor/Åker.


UPDATE: Try changing the RewriteRule pattern from .* to [\s\S]* instead. For example:

 RewriteRule ([\s\S]*) index.php?request=$1 [QSA,L]

This is just a slightly more encompassing pattern. Whilst . (dot) matches any-character (excluding newlines), [\s\S] matches any whitespace and any non-whitespace characters (ie. everything).