Robots.txt – Allow Crawling of Category/Product Pages and JS/CSS/Images

I'm new to configuring robots.txt and trying to understand how to best configure for our Magento EE install. Right now, I have the following:

### bots should have full access to category/product url-key paths
### bots should have full access to media (images), skin (images/js/css), js directories

User-agent: *
### Directories
Disallow: /app/
Disallow: /var/
Disallow: /downloader/
Disallow: /lib/
Disallow: /pkginfo/
### Paths (clean URLs)
Disallow: /*?
Disallow: /admin/
Disallow: /catalog/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /customer/
Disallow: /newsletter/
Disallow: /onestepcheckout/
Disallow: /report/
Disallow: /review/
Disallow: /wishlist/
### Files
Disallow: /api.php
Disallow: /apc_clear.php
Disallow: /index.php
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /get.php
Disallow: /install.php
Disallow: /pi.php
Disallow: /LICENSE.html
Disallow: /LICENSE_AFL.txt
Disallow: /LICENSE_EE.html
Disallow: /LICENSE_EE.txt
Disallow: /LICENSE_SMD_ColorSwatch.txt
Disallow: /RELEASE_NOTES.txt

Sitemap: http://www.example.com/sitemap/sitemap.xml

Agents should still be able to crawl all clean web page URLs, like CMS pages and category/product url-key paths, correct? For example, these URLs/pages would not be blocked and are still crawlable:

www.example.com/shape/fedora/            #category view page  
www.example.com/shape/fedora/cool-hat    #product view page  
www.example.com/some-cms-page/url-key    #CMS page

Inchoo’s recommended Magento robots.txt boilerplate: # Google Image Crawler Setup User-agent: Googlebot-Image Disallow: # Crawlers Setup User-agent: * # Directories Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ #Disallow: /js/ #Disallow: /lib/ Disallow: /magento/ #Disallow: /media/ Disallow: /pkginfo/ Disallow: /report/ Disallow: /scripts/ Disallow: /shell/ Disallow: /skin/ Disallow: /stats/ Disallow: /var/ # Paths (clean URLs) Disallow: /index.php/ Disallow: /catalog/product_compare/ Disallow: /catalog/category/view/ Disallow: /catalog/product/view/ Disallow: /catalogsearch/ #Disallow: /checkout/ Disallow: /control/ Disallow: /contacts/ Disallow: /customer/ Disallow: /customize/ Disallow: /newsletter/ Disallow: /poll/ Disallow: /review/ Disallow: /sendfriend/ Disallow: /tag/ Disallow: /wishlist/ Disallow: /catalog/product/gallery/ # Files Disallow: /cron.php Disallow: /cron.sh Disallow: /error_log Disallow: /install.php Disallow: /LICENSE.html Disallow: /LICENSE.txt Disallow: /LICENSE_AFL.txt Disallow: /STATUS.txt # Paths (no clean URLs) #Disallow: /*.js$ #Disallow: /*.css$ Disallow: /*.php$ Disallow: /*?SID=

Robots.txt – Allow Crawling of Category/Product Pages and JS/CSS/Images

Best Answer

Related Topic

Best Answer

Related Solutions

Magento 1.7 – NOINDEX, NOFOLLOW for Selected Category Pages

Magento – “magento2/sitemap.xml” is always and automatically added to the robots.txt

Related Topic