Limit number of requests to a specific set of URLs by IP address

apache-2.2rate-limiting

I am working on a site, which is going to allow downloads to users, there will be around 2,000,000 files which can be downloaded.

We want to discourage people from crawling and taking all of these documents so would like to limit the number of requests we server containing a URL pattern over a certain time limit. We are happy for the rest of the site to be crawled so don't want to limit that.

We are putting an exclusion in robots.txt to discourage crawlers from getting the files. we are more worried about malicious or misbehaving crawlers.

We would like to use apache to limit the number of downloads of the documents to about 1 per minute per ip address.

Is there a best practice way to do this?

we are using Centos with apache2.2

There are a lot of similar questions to this but most of them seem to center on bandwidth limiting which is not what I want.

Best Answer

I don't think it exists a module to limit connections per time per IP. But you should play a little bit with limitipconn and mod_cband ... probably together can do that. Or you can use limitipconn with iptables.

To do that probably you should use iptables:

iptables -A INPUT -p tcp --dport 80 -m state --state NEW -m limit --limit 1/minute -j ACCEPT

I didn't test this rule, is just a hint for what you should look.

If you use iptables you should have 2 ip's and different virtual hosts for your main site and your document section, to limit only the ip(virtual host) for documents.

Regards