Search Engine Bot – Large amount of hits

rate-limitingsearch-engineuseragent

I've started tracking user-agent strings on a website at the start of each session. Looking at the data for this month so far I'm seeing on search engine bot that keeps coming up a lot..

Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)

From 9/1/2011 to 9/13/2011 I've logged 2090 hits from this user-agent. From other search engines I'm tracking much lower numbers of hits…

Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) – 353

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) – 175

Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) – 110

www.baidu.com seems to be a Chinese version of Google. Is there a way to throttle their bot? I don't mind them indexing us… in fact its probably a good thing as we have a large asian population utilizing the site, but they seem to be doing it a lot more.

Best Answer

You want to throttle the bot, but you don't appear to know WHY you want to do this.
Are you experiencing a performance impact? Is the traffic pushing you over a bandwidth or transfer threshold?

Throttling a bot "just because" is a waste of effort - If it is not hurting you I suggest that you leave it alone.

If it is causing problems you can take steps using sitemaps.xml to limit how often the bot crawls, or robots.txt directives to limit the crawl rate. Note that both of these can be ignored, which would only leave you the option of blocking the user agent using (e.g.) an Apache mod_rewrite rule -- this would also result in your not being indexed...

Related Topic