Robots.txt Advice – Need Advice for Optimizing Robots.txt File

robots.txt

I've noticed my indexed pages are dropping from Google like a rock. I'm reviewing everything that was changed in the last month. I noticed in my Google Webmaster Tools that there were some inaccessible pages, to compensate I blocked them with my robots.txt file, see bellow:

User-agent: Baiduspider
Disallow: /
User-agent: * 
Disallow: /index.php/ 
Disallow: /*? Disallow: /*.js$ 
Disallow: /*.css$ 
Disallow: /checkout/ 
Disallow: /tag/ 
Disallow: /catalogsearch/ 
Disallow: /review/ 
Disallow: /app/ 
Disallow: /downloader/ 
Disallow: /js/ 
Disallow: /lib/ 
Disallow: /media/ 
Disallow: /*.php$ 
Disallow: /pkginfo/ 
Disallow: /report/ 
Disallow: /skin/ 
Disallow: /var/ 
Disallow: /customer/ 
Disallow: /productdata/
Disallow: /productscripts/
Disallow: /includes/
Disallow: /wishlist/
Disallow: /shop/
Disallow: /supplier/
Disallow: /eng/catalog/gallery/
Disallow: /fr/catalog/gallery
Disallow: /eng/catalog/product/gallery/
Disallow: /fr/catalog/product/gallery
Disallow: /eng/cms/index/noCookies
Disallow: /fr/cms/index/noCookies
Disallow: /eng/catalog/view/_ignore_category/
Disallow: /fr/catalog/view/_ignore_category/
Disallow: /eng/customer/
Disallow: /fr/customer/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/


Disallow: /*?dir*
Disallow: /*?dir=desc
Disallow: /*?dir=asc
Disallow: /*?limit=all
Disallow: /*?mode*`

Is my robots.txt file too restrictive?

Best Answer

According to Google the best, most complete answer to your question is available in Google Webmaster Tools under Crawl > Blocked URLs. Depending on the number of results you have on category pages and the pagination configuration, the

Disallow: /*?

line will block Googlebot from crawling anything but the first page of a category list.

Also, I would not block access to /media/, or at least not to /media/catalog/ or /media/wysiwyg/. Images have value too!