R – How I do to block Web scraping without blocking Well behaved bots

botsscreen-scrapingSecurityseo

I'm building an e-commerce website with a large database of products. Of course, is nice when Goggle indexes all products of the website. But what if some competitor wants Web Scrape the website and get all images and product descriptions?

I was observing some websites with similar lists of products, and they place a CAPTCHA, so "only humans" can read the list of products. The drawback is… it is invisible for Google, Yahoo or another "Well behaved" bots.

Best Answer

You can discover the IP addresses the Google and others are using by checking visitor IPs with whois (in the command line or on a web site). Then, once you've accumulated a stash of legit search engines, allow them into your product list without the CAPTCHA.

Related Topic