Robots.txt – Allow Crawling of Category/Product Pages and JS/CSS/Images

robots.txt

I'm new to configuring robots.txt and trying to understand how to best configure for our Magento EE install. Right now, I have the following:

### bots should have full access to category/product url-key paths
### bots should have full access to media (images), skin (images/js/css), js directories

User-agent: *
### Directories
Disallow: /app/
Disallow: /var/
Disallow: /downloader/
Disallow: /lib/
Disallow: /pkginfo/
### Paths (clean URLs)
Disallow: /*?
Disallow: /admin/
Disallow: /catalog/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /customer/
Disallow: /newsletter/
Disallow: /onestepcheckout/
Disallow: /report/
Disallow: /review/
Disallow: /wishlist/
### Files
Disallow: /api.php
Disallow: /apc_clear.php
Disallow: /index.php
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /get.php
Disallow: /install.php
Disallow: /pi.php
Disallow: /LICENSE.html
Disallow: /LICENSE_AFL.txt
Disallow: /LICENSE_EE.html
Disallow: /LICENSE_EE.txt
Disallow: /LICENSE_SMD_ColorSwatch.txt
Disallow: /RELEASE_NOTES.txt

Sitemap: http://www.example.com/sitemap/sitemap.xml

Agents should still be able to crawl all clean web page URLs, like CMS pages and category/product url-key paths, correct? For example, these URLs/pages would not be blocked and are still crawlable:

www.example.com/shape/fedora/            #category view page  
www.example.com/shape/fedora/cool-hat    #product view page  
www.example.com/some-cms-page/url-key    #CMS page  

Best Answer

I reckon Inchoo gives a good overview as well, with different options and examples to learn from

Inchoo’s recommended Magento robots.txt boilerplate:

# Google Image Crawler Setup
User-agent: Googlebot-Image
Disallow:

# Crawlers Setup
User-agent: *

# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
#Disallow: /js/
#Disallow: /lib/
Disallow: /magento/
#Disallow: /media/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/

# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
#Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /catalog/product/gallery/

# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

# Paths (no clean URLs)
#Disallow: /*.js$
#Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?SID=

Source Click here

Related Topic