Create a rule in htaccess to block requests to the site

.htaccessmod-rewrite

Today my server was bloated with hundred of requests to the contact page of my site (/contact) in just 2 minutes.

I get hundred of these lines in my apache log:

*31.13.115.6 - - [18/Jun/2019:10:54:39 +0200] "GET /contacto HTTP/1.1" 301 331 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 232*

*31.13.115.25 - - [18/Jun/2019:10:54:39 +0200] "GET /contacto HTTP/1.1" 301 331 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 232*

I'm not sure what is the cause of this, but my server was down because of this. I want to make sure this will not happen again.

My server provider told that I can block these request adding a rule in my .htaccess using RewriteCond.

I know that I will have to use something like:

RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit/1.1"

but I have not much knowledge about this.



UPDATE for MrWhite:

I think that I know what could be the problem. I have an old site oldsite.com which is redirected to my new site newsite.com. In the htaccess of oldsite.com I added these lines to created the redirection:

Rules in oldsite.com/.htaccess

RewriteEngine on
RewriteRule ^(.*)$ https://www.newsite.com/$1 [R=301,L]

This rule was created because I changed the domain of my site, then the goal of this rule is to redirect the traffic from the oldsite to the newsite without hurting SEO.

It worked fine until now. Do you think this could be the cause of this? If so, do you think I need to change this rule in www.oldsite.com/.htaccess instead of adding other rules in www.newsite.com/.htaccess ?

Best Answer

You state that these requests are for your contact page /contact, however, the log entries you've posted are for /contacto (and extra "o") and these show a 301 redirect response, which will trigger a second request to your server (providing the crawler follows redirects). Why is there a 301 redirect? To what page are you redirecting to?

These do appear to relate to the genuine Facebook "crawler", but as noted in numerous StackOverflow questions, the Facebook crawler does seem to be prone to being rather aggressive!

RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit/1.1"

The RewriteCond (condition) directive alone does nothing. You need a RewriteRule to actually do something.

For example:

RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit/1\.1
RewriteRule ^contact$ - [F]

The above will send a 403 Forbidden for all requests to /contact where the user-agent starts with facebookexternalhit/1.1. (It's a regex, so the literal dot should be backslash escaped.)

The request is naturally still hitting your application server (to block the request entirely you would need some kind of proxy), but it's now not doing much when it does.

The accepted answer on the linked question above talks about sending a 429 Too Many Requests status instead (together with a Retry-After header) - but this is only after a certain number of requests in quick succession (PHP script provided).

Related Topic