Blocking all users without referrer BUT allowing Googlebot/bingbot at the same time (with .htaccess)

.htaccessmod-rewritesetenv

Because of some amateur-made DDOS attack on my website, I had to deny some traffic with .htaccess which worked fine.

Unfortunately, it also blocks the googlebot/bingbot:

order allow, deny
deny from 54.

SetEnvIfNoCase Referer "^$" bad_user
SetEnvIfNoCase User-Agent "^Wget" bad_user
Deny from env=bad_user

It simply block whole traffic from 54.x.x.x (only traffic I get from it is from infected amazon cloud – I know I could exclude just 30 IPs ranges for amazon cloud and not the whole 54.x.x.x but I was in a need of fast solution).

The rest of bots (most of them from China, Taiwan and so on) don't use referrer, so:

SetEnvIfNoCase Referer "^$" bad_user

blocks them all.

But it also have a side effects:

  1. When somebody visit my page from bookmark or when he type it directly to the browser (e.g. he has red it on business card), he won't see my website.
  2. Googlebot, bingbot (as well as other less important bots) usually don't use referrer either.

#1 is an inconvenience, but #2 is a real problem I have to solve quickly.

I've found that bots important for me use those labels:

66.249.64.119 - - [...] "GET /robots.txt HTTP/1.1" 403 534 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.119 - - [...] "GET /programowanie/ HTTP/1.1" 403 537 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.115 - - [...] "GET /3d-graphic/ HTTP/1.1" 403 535 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

207.46.13.4 - - [...] "GET /robots.txt HTTP/1.1" 403 534 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.4 - - [...] "GET / HTTP/1.1" 403 524 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

Is it possible in .htaccess to somehow merge my rules with "but if label contains "Googlebot" or "bingbot", let him go" as the most important one (even if they don't use referrer)?

If not, maybe I can add something to robots.txt to inform Google/Bing that they should have put referrer in their labels (I doubt they would take it into account)?

Best Answer

I have found some solution for #2:

order deny,allow
deny from 54.

SetEnvIfNoCase Referer "^$" bad_user
SetEnvIfNoCase User-Agent "^Wget" bad_user
SetEnvIfNoCase User-Agent "http://www.bing.com/bingbot.htm" good_user
SetEnvIfNoCase User-Agent "http://www.google.com/bot.html" good_user
Deny from env=bad_user
Allow from env=good_user

Note the order deny, allow - thanks to it it will work that way:

  1. Block all traffic from 54.x.x.x. Also block all traffic without referrer.
  2. Then, unblock the traffic for the request that contains either http://www.bing.com/bingbot.htm or http://www.google.com/bot.html.

Anyway, I will wait for other answers, because I'm not sure if it's optimal solution for #2.

And I still did not manage to solve #1.

So if you want to:

block all users without referrer BUT allow Googlebot/bingbot at the same time

you can just use my code for .htaccess without deny from 54. and SetEnvIfNoCase User-Agent "^Wget" bad_user lines, which are specific for my case (ddos).

Related Topic