My server is getting killed by Baiduspider and no matter what I put in my robots.txt file nothing happens. So temporarily I need to block as many of it's IP addresses as I can via iptables. I am getting the IP addresses via:
grep -ri Baidu /var/log/apache2/access.log | cut -f1 -d' ' | sort | uniq
And my iptables rules look like:
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT tcp -- anywhere anywhere tcp dpt:ftp
ACCEPT tcp -- anywhere anywhere tcp dpt:ftp-data
ACCEPT tcp -- anywhere anywhere tcp dpt:ssh
ACCEPT tcp -- anywhere anywhere tcp dpt:www
ACCEPT tcp -- anywhere anywhere tcp dpt:https
ACCEPT tcp -- anywhere anywhere tcp dpt:snpp
ACCEPT tcp -- anywhere anywhere tcp dpt:mysql
LOG all -- anywhere anywhere limit: avg 5/min burst 5 LOG level debug prefix `iptables denied: '
DROP all -- anywhere anywhere
I'm planning on adding the new rules with something similar to:
for ip in $(grep -ri Baidu /var/log/apache2/access.log | cut -f1 -d' ' | sort | uniq); do iptables -A INPUT -s $ip -j DROP; done
But I don't think I can just add them. I think they need to be inserted in a specific location. Where do they need to be inserted to take effect?
Best Answer
Your first rule looks to be allowing everything, making all of the rules below it pointless (is this intended?).
You'd need to insert the rule above that for it to have any effect; change your
-A INPUT
to a-I INPUT
.You might also consider just blocking them in Apache based on user agent instead - a 403 response might get the message across better than a failed connection, and won't require the resources that you're using now to respond to those requests with content.