Magento – Google Ignore’s Magento’s robots.txt

robots.txtseo

I have the following in my robots.txt for months now:

Disallow: /catalogsearch/
Disallow: /catalogsearch/*
Disallow: /webshop/catalogsearch/*
Disallow: /webshop/catalogsearch/

My webshop url is like www.webshopdomain.com/webshop

The problem is:
Google still shows and crawling all search results (39.000 ..)

What am I doing wrong?

Best Answer

The code you have in your robots.txt file is spot on. Submitting the urls in your Google Webmaster Tools account for removal from search index will be the best way to prevent them showing.

There are a number of other urls and prefixes you need to think about too.

The below is more or less optimum Magento robots.txt file default entries:

# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by search engines.

# Google Image Crawler Setup
User-agent: Googlebot-Image
Disallow:

# Crawlers Setup
User-agent: *

# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
Disallow: /media/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/

# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /catalog/product/gallery/

# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

# Paths (no clean URLs)
Disallow: /*.js$
Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?SID=