Magento – Magento setting to remove Google indexed urls that had parameters

seo

I have been setting up a new magento site for a client and while I was still learning my way around Magento (only new to it) Google managed to index the entire site and has indexed +7000 entries for a site with like 46 pages in its sitemap.

Looking further at it, the majority has been the individual combinations of parameters available for each of the pages.

And I also tidied up the urls for SEO by removing the .html at the end (but the .html versions got indexed too and now have 404's)

I have the correct robots.txt in place to block such a thing happening.

The problem now is because google bet me to it, I have thousands of indexed pages with 404 errors.

Some reading up suggests that using the robots.txt to block them won't remove those that exist on there already, so I was wondering if anyone knew a magento specific way I could update those urls to NOINDEX rather than having to delete them from the index one at a time.

I was thinking maybe allow those pages through the robots.txt but use .htaccess to update all the links with certain parameters to be NOINDEX so google removes them from the index then once removed, block them again in the robots.txt, but am not too sure on what solution would be best…

Any ideas?

Best Answer

There are three steps to this

  1. add no-index tag

  2. add canonical tags

  3. Add them to robots.txt with disallow mark

Wait for few days for changes to implement Ignore / mark the 404 errors as fixed

For change from .html to non html - You need to apply 301 redirection. here is code

Rewrite valid requests on .html files

RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^ %{REQUEST_URI}.html?rw=1 [L,QSA]