Iptables – blocking spiders via iptables

firewalliptables

My server is getting killed by Baiduspider and no matter what I put in my robots.txt file nothing happens. So temporarily I need to block as many of it's IP addresses as I can via iptables. I am getting the IP addresses via:

grep -ri Baidu /var/log/apache2/access.log | cut -f1 -d' ' | sort | uniq

And my iptables rules look like:

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            ctstate RELATED,ESTABLISHED 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ftp 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ftp-data 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:www 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:https 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:snpp 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:mysql 
LOG        all  --  anywhere             anywhere            limit: avg 5/min burst 5 LOG level debug prefix `iptables denied: ' 
DROP       all  --  anywhere             anywhere 

I'm planning on adding the new rules with something similar to:

for ip in $(grep -ri Baidu /var/log/apache2/access.log | cut -f1 -d' ' | sort | uniq); do iptables -A INPUT -s $ip -j DROP; done

But I don't think I can just add them. I think they need to be inserted in a specific location. Where do they need to be inserted to take effect?

Best Answer

Your first rule looks to be allowing everything, making all of the rules below it pointless (is this intended?).

You'd need to insert the rule above that for it to have any effect; change your -A INPUT to a -I INPUT.

You might also consider just blocking them in Apache based on user agent instead - a 403 response might get the message across better than a failed connection, and won't require the resources that you're using now to respond to those requests with content.