I have written a small website (4 pages, HTML only) and I want to remove the .html extension from the URL by putting some rewrite rules in my .htaccess file, I've Googled around and found several snippets similar to this:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
</IfModule>
Both of the following URLs serve the same content (which I would expect)
https://example.io/contact
https://example.io/contact.html
However the following gives a 500 error:
https://example.io/contact/
This directory does not exist and if I remove the rewrite code mentioned above it will 404 instead which is what I would expect. Why is the code above causing a 500 error?
Even more interesting is that this will 500:
https://example.io/contact/blah
But this will 404:
https://example.io/contact123/blah
Neither contact/ or contact123/ exist as a directory but contact.html does exist and contact123.html does not.
Any help or explanation would be appreciated.
Edit:
MrWhite has already given the correct answer but for anyone who is looking in future the Apache error logs look like this:
[Thu Oct 24 20:49:47.722210 2019] [core:error] [pid 13001:tid 139915446667008] [client 1.2.3.4:39006] AH00124: Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.
I had checked the logs and wasn't sure why it was happening but forgot to include this in the question.
Best Answer
tl;dr A request for
/contact/
(or/contact/blah
) results in a rewrite loop (500 Internal Server Error response) becauseREQUEST_FILENAME
contains the mapped filesystem path; not the URL-path you are expecting.The "problem" is the use of
REQUEST_FILENAME
in the 2nd condition. TheREQUEST_FILENAME
server variable contains the absolute filesystem path after the URL has been mapped to the filesystem. This is not necessarily the same as the URL-path - but this condition assumes that it is. When the URL-path contains whole path segments that do not map to the filesystem (as in/contact/blah
or/contact123/blah
) then theREQUEST_FILENAME
is essentially "reduced" to the last path segment that maps to a directory, plus the "filename" (ie..../contact
and.../contact123
respectively - the document root, ie./
, is the last matched directory in this example).Request
/contact
When you request
/contact
then the URL-path is/contact
andREQUEST_FILENAME
is/path/to/document-root/contact
- so theREQUEST_FILENAME
maps directly to the URL-path. The test condition/path/to/document-root/contact.html
is successful and the request is rewritten tocontact.html
. All is good.Request
/contact/
or/contact/blah
However, when you request
/contact/
then the URL-path is/contact/
, but theREQUEST_FILENAME
is again/path/to/document-root/contact
(no slash suffix). The test condition is again successful (as above), but the request is rewritten tocontact/.html
(since.html
is appended to the captured URL-path, ie.$1.html
). Processing loops,REQUEST_FILENAME
evaluates to the same as before (the condition is again successful) and the request is rewritten a 2nd time tocontact/.html.html
. Etc, etc, resulting in a rewrite loop which eventually reaches an internal limit (default 10) when it "breaks" and the server responds with a 500 Internal Server Error.Request
/contact123/blah
/contact123/blah
, on the other hand, results in a 404 because theREQUEST_FILENAME
server variable becomes/path/to/document-root/contact123
and/path/to/document-root/contact123.html
does not exist, so no rewrite occurs in the first place.Solution
To "fix" this behaviour we need to make sure we are testing the same file/URL-path that we are ultimately rewriting to.
We can do this by constructing the absolute filename (to test) by concatenating the
DOCUMENT_ROOT
andREQUEST_URI
server variables (or$1
backreference), which contains the root-relative URL-path. (Note thatREQUEST_URI
includes the slash prefix, whereas the$1
backreference does not.)For example:
Now, the test condition is testing the same filesystem path that the request will be rewritten to (if successful).
There is no need to check that the request does not map to a directory and that it does map to a file (when appending the
.html
extension), unless you also have directories with the same name as the file basename (eg.basename.html
andbasename/
). But if that is the case then one or other is not going to be inaccessible anyway, so the situation is best avoided.A request for
/contact/
,/contact/blah
or/contact123/blah
all now result in a 404 as expected.Note that there's no need to backslash escape the literal dot in the
RewriteCond
TestString since this is not a regex.Minor points... the
^
and$
anchors on^(.*)$
(and^(.+)$
) are unnecessary since the*
(and+
) quantifier is greedy by default (although some users do still seem to like them for readability?). You should also include theL
(last
) flag on theRewriteRule
. Whilst this is not necessary if this is the only (or last) rule in the.htaccess
file, if you should add more rules later then it probably is (and having to remember to modify existing rules in this way is prone to error).With the use of the
$1
backreference in theRewriteCond
directive, this does assume that the.htaccess
file is in the document root, otherwise, the filesystem check as written will be incorrect. If the.htaccess
file is located in a subdirectory then change theRewriteCond
directive to use theREQUEST_URI
server variable instead. For example:Optimisation
You could avoid unnecessarily checking all requests that already contain a file extension (ie. all your static resources) by restricting the regex to URLs that do not contain what looks-like a file extension. For example: