How to use fail2ban to block scrapers

apache-2.2blockfail2banrate-limitingscraping

I have a media site and problems of users coming along and scraping all of the content.I placed a invisible URL on the page to catch spiders that immediately blocks the ip, but some people have figured out the URL scheme and are creating their own scripts.

All the fail2ban filters I have seen thus far deal with failed login attempts, but I desire one that will be more advanced and will detect, then rate-limit and/or block abusers. The urls the scrapers use are all valid, so if they go slow enough, I won't be able to tell, but I imagine I can keep the amateurs out through fail2ban.

How can I implement this filter in fail2ban the right way while minimizing my false positives on legit users?

Best Answer

I'm not really sure fail2ban is the right tool here; you might want to look at something like mod_security (http://www.modsecurity.org/). You'll be able to track requests from a session or ip context, define rules to describe suspect traffic, and then deny/slow it accordingly.

EDIT: You didn't specify, so I'm just assuming that you're using Apache.

Related Topic