mod_rewrite syntax order
mod_rewrite has some specific ordering rules that affect processing. Before anything gets done, the RewriteEngine On
directive needs to be given as this turns on mod_rewrite processing. This should be before any other rewrite directives.
RewriteCond
preceding RewriteRule
makes that ONE rule subject to the conditional. Any following RewriteRules will be processed as if they were not subject to conditionals.
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^https?://serverfault\.com(/|$)
RewriteRule $/blog/(.*)\.html $/blog/$1.sf.html
In this simple case, if the HTTP referrer is from serverfault.com, redirect blog requests to special serverfault pages (we're just that special). However, if the above block had an extra RewriteRule line:
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^https?://serverfault\.com(/|$)
RewriteRule $/blog/(.*)\.html $/blog/$1.sf.html
RewriteRule $/blog/(.*)\.jpg $/blog/$1.sf.jpg
All .jpg files would go to the special serverfault pages, not just the ones with a referrer indicating it came from here. This is clearly not the intent of the how these rules are written. It could be done with multiple RewriteCond rules:
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^https?://serverfault\.com(/|$)
RewriteRule ^/blog/(.*)\.html /blog/$1.sf.html
RewriteCond %{HTTP_REFERER} ^https?://serverfault\.com(/|$)
RewriteRule ^/blog/(.*)\.jpg /blog/$1.sf.jpg
But probably should be done with some trickier replacement syntax.
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^https?://serverfault\.com(/|$)
RewriteRule ^/blog/(.*)\.(html|jpg) /blog/$1.sf.$2
The more complex RewriteRule contains the conditionals for processing. The last parenthetical, (html|jpg)
tells RewriteRule to match for either html
or jpg
, and to represent the matched string as $2 in the rewritten string. This is logically identical to the previous block, with two RewriteCond/RewriteRule pairs, it just does it on two lines instead of four.
Multiple RewriteCond lines are implicitly ANDed, and can be explicitly ORed. To handle referrers from both ServerFault and Super User (explicit OR):
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^https?://serverfault\.com(/|$) [OR]
RewriteCond %{HTTP_REFERER} ^https?://superuser\.com(/|$)
RewriteRule ^/blog/(.*)\.(html|jpg) /blog/$1.sf.$2
To serve ServerFault referred pages with Chrome browsers (implicit AND):
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^https?://serverfault\.com(/|$)
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Chrome.*$
RewriteRule ^/blog/(.*)\.(html|jpg) /blog/$1.sf.$2
RewriteBase
is also order specific as it specifies how following RewriteRule
directives handle their processing. It is very useful in .htaccess files. If used, it should be the first directive under "RewriteEngine on" in an .htaccess file. Take this example:
RewriteEngine On
RewriteBase /blog
RewriteCond %{HTTP_REFERER} ^https?://serverfault\.com(/|$)
RewriteRule ^(.*)\.(html|jpg) $1.sf.$2
This is telling mod_rewrite that this particular URL it is currently handling was arrived by way of http://example.com/blog/ instead of the physical directory path (/home/$Username/public_html/blog) and to treat it accordingly. Because of this, the RewriteRule
considers it's string-start to be after the "/blog" in the URL. Here is the same thing written two different ways. One with RewriteBase, the other without:
RewriteEngine On
##Example 1: No RewriteBase##
RewriteCond %{HTTP_REFERER} ^https?://serverfault\.com(/|$)
RewriteRule /home/assdr/public_html/blog/(.*)\.(html|jpg) $1.sf.$2
##Example 2: With RewriteBase##
RewriteBase /blog
RewriteCond %{HTTP_REFERER} ^https?://serverfault\.com(/|$)
RewriteRule ^(.*)\.(html|jpg) $1.sf.$2
As you can see, RewriteBase
allows rewrite rules to leverage the web-site path to content rather than the web-server, which can make them more intelligible to those who edit such files. Also, they can make the directives shorter, which has an aesthetic appeal.
RewriteRule matching syntax
RewriteRule itself has a complex syntax for matching strings. I'll cover the flags (things like [PT]) in another section. Because Sysadmins learn by example more often than by reading a man-page I'll give examples and explain what they do.
RewriteRule ^/blog/(.*)$ /newblog/$1
The .*
construct matches any single character (.
) zero or more times (*
). Enclosing it in parenthesis tells it to provide the string that was matched as the $1 variable.
RewriteRule ^/blog/.*/(.*)$ /newblog/$1
In this case, the first .* was NOT enclosed in parens so isn't provided to the rewritten string. This rule removes a directory level on the new blog-site. (/blog/2009/sample.html becomes /newblog/sample.html).
RewriteRule ^/blog/(2008|2009)/(.*)$ /newblog/$2
In this case, the first parenthesis expression sets up a matching group. This becomes $1, which is not needed and therefore not used in the rewritten string.
RewriteRule ^/blog/(2008|2009)/(.*)$ /newblog/$1/$2
In this case, we use $1 in the rewritten string.
RewriteRule ^/blog/(20[0-9][0-9])/(.*)$ /newblog/$1/$2
This rule uses a special bracket syntax that specifies a character range. [0-9] matches the numerals 0 through 9. This specific rule will handle years from 2000 to 2099.
RewriteRule ^/blog/(20[0-9]{2})/(.*)$ /newblog/$1/$2
This does the same thing as the previous rule, but the {2} portion tells it to match the previous character (a bracket expression in this case) two times.
RewriteRule ^/blog/([0-9]{4})/([a-z]*)\.html /newblog/$1/$2.shtml
This case will match any lower-case letter in the second matching expression, and do so for as many characters as it can. The \.
construct tells it to treat the period as an actual period, not the special character it is in previous examples. It will break if the file-name has dashes in it, though.
RewriteRule ^/blog/([0-9]{4})/([-a-z]*)\.html /newblog/$1/$2.shtml
This traps file-names with dashes in them. However, as -
is a special character in bracket expressions, it has to be the first character in the expression.
RewriteRule ^/blog/([0-9]{4})/([-0-9a-zA-Z]*)\.html /newblog/$1/$2.shtml
This version traps any file name with letters, numbers or the -
character in the file-name. This is how you specify multiple character sets in a bracket expression.
RewriteRule flags
The flags on rewrite rules have a host of special meanings and usecases.
RewriteRule ^/blog/([0-9]{4})/([-a-z]*).\html /newblog/$1/$2.shtml [L]
The flag is the [L]
at the end of the above expression. Multiple flags can be used, separated by a comma. The linked documentation describes each one, but here they are anyway:
L = Last. Stop processing RewriteRules once this one matches. Order counts!
C = Chain. Continue processing the next RewriteRule. If this rule doesn't match, then the next rule won't be executed. More on this later.
E = Set environmental variable. Apache has various environmental variables that can affect web-server behavior.
F = Forbidden. Returns a 403-Forbidden error if this rule matches.
G = Gone. Returns a 410-Gone error if this rule matches.
H = Handler. Forces the request to be handled as if it were the specified MIME-type.
N = Next. Forces the rule to start over again and re-match. BE CAREFUL! Loops can result.
NC = No case. Allows jpg
to match both jpg and JPG.
NE = No escape. Prevents the rewriting of special characters (. ? # & etc) into their hex-code equivalents.
NS = No subrequests. If you're using server-side-includes, this will prevent matches to the included files.
P = Proxy. Forces the rule to be handled by mod_proxy. Transparently provide content from other servers, because your web-server fetches it and re-serves it. This is a dangerous flag, as a poorly written one will turn your web-server into an open-proxy and That is Bad.
PT = Pass Through. Take into account Alias statements in RewriteRule matching.
QSA = QSAppend. When the original string contains a query (http://example.com/thing?asp=foo) append the original query string to the rewritten string. Normally it would be discarded. Important for dynamic content.
R = Redirect. Provide an HTTP redirect to the specified URL. Can also provide exact redirect code [R=303]. Very similar to RedirectMatch
, which is faster and should be used when possible.
S = Skip. Skip this rule.
T = Type. Specify the mime-type of the returned content. Very similar to the AddType
directive.
You know how I said that RewriteCond
applies to one and only one rule? Well, you can get around that by chaining.
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^https?://serverfault\.com(/|$)
RewriteRule ^/blog/(.*)\.html /blog/$1.sf.html [C]
RewriteRule ^/blog/(.*)\.jpg /blog/$1.sf.jpg
Because the first RewriteRule has the Chain flag, the second rewrite-rule will execute when the first does, which is when the previous RewriteCond rule is matched. Handy if Apache regular-expressions make your brain hurt. However, the all-in-one-line method I point to in the first section is faster from an optimization point of view.
RewriteRule ^/blog/([0-9]{4})/([-0-9a-zA-Z]*)\.html /newblog/$1/$2.shtml
This can be made simpler through flags:
RewriteRule ^/blog/([0-9]{4})/([-0-9a-z]*)\.html /newblog/$1/$2.shtml [NC]
Also, some flags also apply to RewriteCond. Notably, NoCase.
RewriteCond %{HTTP_REFERER} ^https?://serverfault\.com(/|$) [NC]
Will match "ServerFault.com"
Best Answer
Careful: you appear to be conflating concepts in the DNS with HTTP (web browsing) protocol concepts. These are different, and must be addressed separately.
DNS is used to provide a mapping between domain names and IP numbers (and other resource records, but that's not relevant here). In other words, it identifies a server responsible for www.user1owndomain.com, www.user2owndomain.com, www.mywebsite.com, and that is all.
The DNS has no concept of protocols which might connect to that server, and hence HTTP data such as pathnames (/user1) are nonsensical in a DNS sense. They are not even communicated to a DNS server during the address resolution stage. Remember that a domain name of www.domain.com does not imply this may only be used for web traffic; it is only convention that www.domain.com typically has a machine running at the other end listening on port 80 and serving a website, but it could equally also service SSH traffic, a mail server, etc.
HTTP, the protocol spoken between a browser and the server which the domain resolves to. If www.mywebsite.com is at IP 1.2.3.4 according to DNS, the browser connects to that server on port 80 for web browsing purposes.
If you need to perform redirection of web traffic, that is done at the HTTP level, and DNS has minimal bearing on the actual redirect being accomplished (even if you use a DNS alias or CNAME record; see below).
Yes. There are two possible routes:
Use an EC2 instance on which you run a web server (Apache, nginx, or similar). Configure DNS for the www.userXowndomain.com to resolve to the IP address of that server. Configure Apache virtual hosts for each of the user domains pointing to the web server. Configure each virtual host to do a redirect to the desired URL (www.mywebsite.com/user1), as per this SF question.
Net effect: DNS resolves to IP, browser connects to IP, Apache identifies domain and maps to virtual host, virtual host configuration sends HTTP redirect to proper URL.
If you don't want to run a full EC2 instance solely for web traffic, you can use S3 buckets for the purpose, in combination with route 53 to direct traffic to each www.userXowndomain.com to the bucket. The bucket is subsequently configured with a redirect to the appropriate URL.
Note: if you have an EC2 instance for hosting www.mywebsite.com, there is no reason this cannot also be used for doing the redirects. It's just additional Apache virtual host configuration, so a separate instance is not required.
There is no hard limit. There may be artificially imposed limits due to Apache configuration on the number of virtual hosts (essentially due to logging and the number of open file descriptors), but those can be overcome through careful configuration.
Soft limits might come into play if the redirects are extremely busy, such that the server gets overloaded. This would be something you would detect through routine performance monitoring, and introducing more servers for load balancing (or peeling off busy work to dedicated machines). Unless you are running very high-traffic sites, this is unlikely to be a concern.
As noted above, take care not to confuse DNS for HTTP. If the user's domains have their own DNS service, such as one provided by the registrar, you can make use of that to configure DNS records pointing at the EC2 instance running the web server redirects. You can delegate the domain's nameservers to route 53 too. The ultimate answer probably depends on your relationship with the customer: the former solution allows the customer to manage their own DNS (for other services, such as mail, for example). The latter turns over control of the entire userXowndomain.com namespace to you, meaning you must configure ALL aspects of DNS related to that domain. This might include but is not limited to MX records for mail delivery, any other subdomains the customer desires, aliases, SPF records, etc.
Apache running on Linux would be one method of achieving this, and is probably cost effective in doing so. There are plenty of other web serving platforms just as capable of issuing HTTP redirects. Configuration is in accordance with the respective pages for each web server, configuring virtual hosts for each domain and configuring an HTTP redirect for that domain.
What type of redirect?
The HTTP/1.1 standard, in RFC 2616, defines multiple types of redirect. You will need to specify the type of redirect the web server should serve to redirect www.userXowndomain.com --> www.mydomain.com/userX. It would be instructive to study the RFC, since different redirects imply different behaviour, and can have secondary impacts on issues such as SEO.
The most common redirects are those defined by HTTP status codes 301 and 307, corresponding to "permanently moved to another URL" and "temporarily moved to another URL" respectively. A temporarily moved URL, in particular, implies that the original URL (www.userXowndomain.com) should still be used to access the requested resource in the future, and user agents should not update their records to permanently use www.mywebsite.com/user1 as the initial URL.
Redirects using "framesets" have also been used in the past. This is the case where you host a single web page at each user's domain, which uses the tag with a single frame, pointing at the correct URL. This approach should generally be avoided.
Using DNS aliases as "redirects"
There is often confusion about the role of DNS and the HTTP protocol in web browsing, in particular because DNS provides CNAME or "alias" records, which alias the DNS return of one domain to the output of another.
A DNS alias means nothing to a web browser; a DNS alias merely tells the DNS resolver that requests for domain X should be answered as if it were domain Y being requested, and by following the chain, an IP address should eventually be reached to which the browser connects. The browser then connects to that address, still believing it is talking to the originally requested domain (e.g. www.userXowndomain.com).
You could implement your redirects using CNAMEs for each user domain, with www.userXowndomain.com pointing at the www.mywebsite.com domain in the DNS. However, you still need to configure the web server with virtual hosts or similar to match each request for www.userXowndomain.com and conduct the redirect to www.mywebsite.com/userX. The DNS just tells the browser how to reach the web server; it doesn't tell the browser what the redirection URL is instead.
This configuration is not without its faults, however. If you want to redirect the root of the zone (i.e. http:// userXowndomain.com), there are limitations over using a CNAME at the root of the zone because no other data should co-exist at the root of the zone according to the RFCs (such as an MX record for mail delivery). This might not be a problem, or it might be a showstopper, depending on the service you offer your users.
Use the registrar?
I should note that many domain registrars provide basic 301 or 307 HTTP redirects as standard with a domain name registration, and hence you may be able to avoid considerable complexity by employing their service rather than building your own.