Blocking by user-agent string in httpd.conf not effective

httpd.confsetenvuseragent

I'd like to block some spiders and bad bots by user-agent text string for all of my virtual hosts via httpd.conf but have yet to find success. Below are the contents of my http.conf file. Any ideas why this isn't working? env_module is loaded.

SetEnvIfNoCase User-Agent "^BaiDuSpider" UnwantedRobot
SetEnvIfNoCase User-Agent "^Yandex" UnwantedRobot
SetEnvIfNoCase User-Agent "^Exabot" UnwantedRobot
SetEnvIfNoCase User-Agent "^Cityreview" UnwantedRobot
SetEnvIfNoCase User-Agent "^Dotbot" UnwantedRobot
SetEnvIfNoCase User-Agent "^Sogou" UnwantedRobot
SetEnvIfNoCase User-Agent "^Sosospider" UnwantedRobot
SetEnvIfNoCase User-Agent "^Twiceler" UnwantedRobot
SetEnvIfNoCase User-Agent "^Java" UnwantedRobot
SetEnvIfNoCase User-Agent "^YandexBot" UnwantedRobot
SetEnvIfNoCase User-Agent "^bot*" UnwantedRobot
SetEnvIfNoCase User-Agent "^spider" UnwantedRobot
SetEnvIfNoCase User-Agent "^crawl" UnwantedRobot
SetEnvIfNoCase User-Agent "^NG\ 1.x (Exalead)" UnwantedRobot
SetEnvIfNoCase User-Agent "^MJ12bot" UnwantedRobot

<Directory "/var/www/">
    Order Allow,Deny
    Allow from all
    Deny from env=UnwantedRobot
</Directory>
<Directory "/srv/www/">
    Order Allow,Deny
    Allow from all
    Deny from env=UnwantedRobot
</Directory>

EDIT – @Shane Madden: I do have .htaccess files in each virtual host's document root with the following.

order allow,deny
deny from xxx.xxx.xxx.xxx
deny from xx.xxx.xx.xx
deny from xx.xxx.xx.xxx
...
allow from all

Could that be creating conflict? Sample VirtualHost config:

<VirtualHost xx.xxx.xx.xxx:80>
 ServerAdmin admin@domain.com
 ServerName domain.com
 ServerAlias www.domain.com
 DocumentRoot /srv/www/domain.com/public_html/
 ErrorLog "|/usr/bin/cronolog /srv/www/domain.com/logs/error_log_%Y-%m"
 CustomLog "|/usr/bin/cronolog /srv/www/domain.com/logs/access_log_%Y-%m"     combined
</VirtualHost>

Best Answer

Try this, and if it fails, try it in a .htaccess file...

   #Bad bot removal
   RewriteEngine on
   RewriteCond %{HTTP_USER_AGENT} ^useragent1 [OR]
   RewriteCond %{HTTP_USER_AGENT} ^useragent2 [OR]
   RewriteCond %{HTTP_USER_AGENT} ^useragent3
   RewriteRule ^(.*)$ http://website-you-want-to-send-bad-bots-to.com

Follow this pattern, and don't put an [OR] on the very last one.

EDIT: New solution:

If you want to block all (friendly) bots, make a file called "robots.txt" and put it in where your index.html is. Inside it, put this:

User-agent: *
Disallow: /

You'd still need to maintain a list like my original answer (above) to disallow the bots that ignore robots.txt.