There's a couple of things I'm trying to understand in relation to a RewriteRule
.
The working rule on a URL strips a query back to a redirect, eg. the URL:
https://www.example.com/application?user=543&AppLink=https://www.example.net/register/reg.aspx?EnquiryID=12345
The working .htaccess
code:
RewriteCond %{REQUEST_URI} ^/application$
RewriteCond %{QUERY_STRING} .*AppLink=(.*)
RewriteRule ^(.*)$ %1 [R=302,L]
Results in (correctly) the redirect URL:
https://www.example.net/register/reg.aspx?EnquiryID=12345
All good until I want to introduce URL encoding into the query link, eg:
https://www.example.com/application?user=543&AppLink=https%3A%2F%2Fwww.example.net%2Fregister%2Freg.aspx?EnquiryID=12345
First up, the introduction of the encode breaks the working RewriteRule
, resulting in this with the http_host name back in – I don't follow why it does that:
https://www.example.com/https%3A%2F%2Fwww.example.net%2Fregister%2Freg.aspx?EnquiryID=12345
Therefore, I'm trying to figure out the best way of "decoding"/stripping the (eg) %3A%2F%2F
back into colons and slashes prior to it pulling the query as a valid URL for the redirect function.
I am assuming, in a way, I need to create a 'looping' RewriteRule to tidy up the encode (regex) then redirect that at the same Host, strip the valid URL and send it off to the redirected host!
Messy and overhead, yes.
Anyone have a suggestion or thoughts on the best way to attack this?
Best Answer
This is really a task for your web application (eg. PHP, Python, etc.), not Apache (
.htaccess
).If this script is "public" then... "Redirect" scripts of this nature are often heavily abused by scammers (for example) so you need to white-list the possible redirect targets (and optionally authenticate the sender). This maybe tricky to implement in
.htaccess
and is probably far better suited to your application itself.The characters
:
and/
don't need to be URL encoded when they appear in the query string part of the URL. But if you were to properly URL encode theAppLink
URL param value then you would also %-encode the?
and=
(part of the target URL).The
QUERY_STRING
server variable is not %-decoded. So, the resulting substitution string is:Apache/mod_rewrite sees this as a relative URL, because it does not start with a slash or valid scheme (ie.
https://
). In the case of a relative URL, mod_rewrite uses the scheme and hostname (and directory-prefix or value of theRewriteBase
directive) from the current request (by default), in order to construct an absolute URL for the external redirect, hence the malformed redirect you are seeing.Solution
As noted above, I would recommend doing this in your application, not
.htaccess
. But anyway, to answer your specific question, you could do something like the following instead of your current directives. However, this requires Apache 2.4+ and access to your server-config (sinceAllowEncodedSlashes
is not permitted in a directory/.htaccess
context):The following needs to go in your server-config (or virtualhost):
Then, in
.htaccess
:Notes:
RewriteRule
pattern instead of using an additional condition that checks against theREQUEST_URI
server variable.QSD
(Query String Discard) flag is required to discard theAppLink
(and any other) URL parameter from the initial request.RewriteRule
directives naturally chain together, the output of one is used as the input of the next, and so on.RewriteRule
pattern matches against is %-decoded. (Whereas theQUERY_STRING
server variable remains %-encoded.) However, contiguous slashes in the URL-path are reduced to single slashes. Hence the check for justhttps:/
(nothttps://
) in theRewriteRule
pattern and the additional slash that is added in the substitution.This also assumes that additional pathname information is permitted in your config. You may need to explicitly set
AcceptPathInfo On
in.htaccess
(or server-config) if not. If not then you will also get a system generated 404.