Magento – catalogsearch url being spammed by chinese bots

catalogsearchsearch

Constantly getting hits on our site search catalogsearch url by what appear to be Chinese bots from mainly China, e.g.

/catalogsearch/result/?q=大奖娱乐88tb88手机版+Q82019309.com.com

I have blocked a lot of IP ranges for China, Russia and Ukraine. Some individual IP addresses are also blocked ** from other countries like **Germany.

The only one that seems to be active now is one that is supposed to be a GoogleBot, several in the 66.249.76.114 range.

Other than IP blocking, all I could think of was copying Data.php to

/app/code/local/ from /app/code/core/Mage/CatalogSearch/Helper/ 

and changing the public function getQueryText() to catch searches that had .com in the string.

I changed the string randomly to something more befitting what the store sells using a small array of products. Only tested so far, see below!

if ($this->_queryText === null) {
   $this->_queryText = '';
} elseif (strtolower(strpos($this->_queryText, '.com')) !== false) {
   $botitems = Array("* product 1","* product 2","* product 3","* product 4","* product 5","* product 6","* product 7","* product 8","* product 9");
   $this->_queryText = $botitems[rand(0, count($botitems) - 1)];
} else {
   /* @var $stringHelper Mage_Core_Helper_String */
   $stringHelper = Mage::helper('core/string');

Questions are:

  1. Is there any other way to stop bots using the search string?
  2. Will these hits cause any problems with site ranking?
  3. other than i.p. blocking is changing the search query when they hit a good or bad idea?
  4. Is the bot that looks like a Googlebot legit?
  5. What are they up to?

Thanks in advance for any help.

Best Answer

Some context: We were having the same issue since 2 weeks with the same URL (Q82019309.com) with variation. We started by blocking all IP range from China + Russia like you did and also added this bot list with no success.

To answer your questions: 1) We extended getQueryText like you suggested and kill the search if the parameter contains ".com" .

Here is the code we used:

elseif (strpos(strtolower($this->_queryText, '.com')) !== false) {
    die();
} else {

Your code will always return true if you don't switch strpos & strtolower. (It did on my end).

2) It might on the long run. If your magento website ends up with thousand of spammy results webpage. Once you extanded your search function, don't forget to delete the result page in Magento.

3) As stated earlier, IP blocking has been an unsuccessful cat of mouse game and we ended the same way you did.

4) This bot is not legit. The only reason we found for this behaviour was some lame black hat SEO to optimize the URL ( magento create a page for the search term that is indexed by google and then rank up the site).

To be totally fair i wont promise that's the best solution but it worked for no downside in our case.

Let me know if you found another solution.

Related Topic