Mod_Rewrite – Redirect to %-Encoded URL Parameter in .htaccess

.htaccessmod-rewriterewrite

There's a couple of things I'm trying to understand in relation to a RewriteRule.

The working rule on a URL strips a query back to a redirect, eg. the URL:

https://www.example.com/application?user=543&AppLink=https://www.example.net/register/reg.aspx?EnquiryID=12345

The working .htaccess code:

RewriteCond %{REQUEST_URI}  ^/application$
RewriteCond %{QUERY_STRING} .*AppLink=(.*)
RewriteRule ^(.*)$ %1 [R=302,L]

Results in (correctly) the redirect URL:

https://www.example.net/register/reg.aspx?EnquiryID=12345

All good until I want to introduce URL encoding into the query link, eg:

https://www.example.com/application?user=543&AppLink=https%3A%2F%2Fwww.example.net%2Fregister%2Freg.aspx?EnquiryID=12345

First up, the introduction of the encode breaks the working RewriteRule, resulting in this with the http_host name back in – I don't follow why it does that:

https://www.example.com/https%3A%2F%2Fwww.example.net%2Fregister%2Freg.aspx?EnquiryID=12345

Therefore, I'm trying to figure out the best way of "decoding"/stripping the (eg) %3A%2F%2F back into colons and slashes prior to it pulling the query as a valid URL for the redirect function.

I am assuming, in a way, I need to create a 'looping' RewriteRule to tidy up the encode (regex) then redirect that at the same Host, strip the valid URL and send it off to the redirected host!

Messy and overhead, yes.

Anyone have a suggestion or thoughts on the best way to attack this?

Best Answer

...the best way to attack this?

This is really a task for your web application (eg. PHP, Python, etc.), not Apache (.htaccess).

If this script is "public" then... "Redirect" scripts of this nature are often heavily abused by scammers (for example) so you need to white-list the possible redirect targets (and optionally authenticate the sender). This maybe tricky to implement in .htaccess and is probably far better suited to your application itself.

https://www.example.com/application?user=543&AppLink=https%3A%2F%2Fwww.domain2.com%2Fregister%2Freg.aspx?EnquiryID=12345

The characters : and / don't need to be URL encoded when they appear in the query string part of the URL. But if you were to properly URL encode the AppLink URL param value then you would also %-encode the ? and = (part of the target URL).

First up, the introduction of the encode breaks the working RewriteRule, resulting in this with the http_host name back in - I don't follow why it does that:

The QUERY_STRING server variable is not %-decoded. So, the resulting substitution string is:

https%3A%2F%2Fwww.example.net%2Fregister%2Freg.aspx?EnquiryID=12345

Apache/mod_rewrite sees this as a relative URL, because it does not start with a slash or valid scheme (ie. https://). In the case of a relative URL, mod_rewrite uses the scheme and hostname (and directory-prefix or value of the RewriteBase directive) from the current request (by default), in order to construct an absolute URL for the external redirect, hence the malformed redirect you are seeing.

Solution

As noted above, I would recommend doing this in your application, not .htaccess. But anyway, to answer your specific question, you could do something like the following instead of your current directives. However, this requires Apache 2.4+ and access to your server-config (since AllowEncodedSlashes is not permitted in a directory/.htaccess context):

The following needs to go in your server-config (or virtualhost):

# Allow %2F to be used in the URL-path part of the URL
# Otherwise Apache will trigger a system generated 404 (security feature)
AllowEncodedSlashes On

Then, in .htaccess:

# Convert URL param value to path-info (via URL rewrite)
# This essentially %-decodes the URL parameter value
RewriteCond %{QUERY_STRING} AppLink=(.+)
RewriteRule ^application$ /application/%1 [QSD]

# Issue redirect using the %-decoded URL-path
RewriteRule ^application/(https?:/)(.+) $1/$2 [R,L]

Notes:

  • When possible, it is more efficient to check the URL-path using the RewriteRule pattern instead of using an additional condition that checks against the REQUEST_URI server variable.
  • The QSD (Query String Discard) flag is required to discard the AppLink (and any other) URL parameter from the initial request.
  • The first URL rewrite is passed through to the next directive that triggers the actual redirect. RewriteRule directives naturally chain together, the output of one is used as the input of the next, and so on.
  • The URL-path that the RewriteRule pattern matches against is %-decoded. (Whereas the QUERY_STRING server variable remains %-encoded.) However, contiguous slashes in the URL-path are reduced to single slashes. Hence the check for just https:/ (not https://) in the RewriteRule pattern and the additional slash that is added in the substitution.

This also assumes that additional pathname information is permitted in your config. You may need to explicitly set AcceptPathInfo On in .htaccess (or server-config) if not. If not then you will also get a system generated 404.