Facebook Crawler Bot Crashing Site

Did Facebook just implement some web crawler? My website has been crashing a couple times over the past few days, severely overloaded by IPs that I've traced back to Facebook.

I have tried googling around but can't find any definitive resource regarding controling Facebook's crawler bot via robots.txt. There is a reference on adding the following:

User-agent: facebookexternalhit/1.1
Crawl-delay: 5

User-agent: facebookexternalhit/1.0
Crawl-delay: 5

User-agent: facebookexternalhit/*
Crawl-delay: 5

But I can't find any specific reference on whether Facebook bot respects the robots.txt. According to older sources, Facebook "does not crawl your site". But this is definitely false, as my server logs showed them crawling my site from a dozen+ IPs from the range of 69.171.237.0/24 and 69.171.229.115/24 at the rate of many pages each second.

And I can't find any literature on this. I suspect it is something new that FB just implemented over the past few days, due to my server never crashing previously.

Can someone please advice?

Best Answer

As discussed in in this similar question on facebook and Crawl-delay, facebook does not consider itself a bot, and doesn't even request your robots.txt, much less pay attention to it's contents.

You can implement your own rate limiting code as shown in the similar question link. The idea is to simply return http code 503 when you server is over capacity, or being inundated by a particular user-agent.

It appears those working for huge tech companies don't understand "improve your caching" is something small companies don't have budgets to handle. We are focused on serving our customers that actually pay money, and don't have time to fend off rampaging web bots from "friendly" companies.

Best Answer

Related Solutions

Android – How to i get Image Resource ID and send it to other activity in Android

Related Topic