.htaccess url rewrite to stop bingbot immediately

.htaccessapache-2.2mod-rewritesearch-engine

I want to stop bingbot completely and immediately .

I'd like to do this using mod_rewrite in .htaccess.

I've got these rules …

Options +FollowSymLinks 
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT}  ^bingbot/.*         [OR]
RewriteCond %{HTTP_USER_AGENT}  ^Bingbot/.*         [OR]
RewriteRule ^(.*)$ http://go.away/                  [L]

… but they're not working. What I can see in my logs is this type of entry …

msnbot-207-46-195-224.search.msn.com - - [11/Jul/2011:15:07:27 -0700] "GET /index.php?url_mainnav=13&url_subnav=131&url_expand=394,949,4631&url_startrow=110 HTTP/1.1" 403 502 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

… I've tried numerous variations on the regex for HTTP_USER_AGENT but I can't the response I want so I presume that the actual structure of the rules I'm using is incorrect.

Can anyone point me in the right direction ?

By the way I know this sort of thing is much better done in iptables etc and I also know about robots.txt. It's shared hosting so I don't have control of iptables and I don't want to wait the six/eight hours for bingbot to reread robots.txt.


Well things are moving forward. Taking the answer into account I changed the rewrite rules to :

Options +FollowSymLinks 
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT}  ^bingbot/.*             [OR,NC]
RewriteCond %{HTTP_USER_AGENT}  .*bingbot/.*            [OR]
RewriteCond %{HTTP_USER_AGENT}  .*Bingbot/.*            [OR]
RewriteRule ^(.*)$ http://go.away/                      [L]

The entries for the bingbot are still appearing in the access log but this has made me realise that (I think) I'm misinterpreting the HTTP response codes shown in the logs. It seems that 403 is 'Forbidden' so perhaps my rule here is doing what I want (telling bingbot to go away) but the request is getting logged ? I thought the log would not reflect stuff that was pushed away by mod_rewrite ? Would be interested if anyone can comment as I'm still not 100% that I'm getting rid of the accesses by bingbot.

Best Answer

Well, the regex in your RewriteCond demands that the User Agent start with bingbot. That's what the ^ in the regex does.

^bingbot/.*

Since the User Agent (from your log example) doesn't start with that, it won't match and skips the Rule.

"Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

Remove the ^ and it should work, though I've not tested.

A tip: you can remove duplication from your RewriteConds by making the match case-insensitive with the [NC] option.

RewriteCond %{HTTP_USER_AGENT}  ^bingbot/.*         [OR,NC]
Related Topic