I'm trying to scrape product listing pages that display the vendors and prices of particular products, but urllib.urlopen isn't working–it will work on all other pages on Amazon, but I'm kind of wondering if Amazon's bots prevent scraping on product listing pages. Can anyone verify this? Using Chrome I can still view page source…
Here's an example of a product listing page I would want to scrape: http://www.amazon.com/gp/offer-listing/B007E84H96/ref=dp_olp_new?ie=UTF8&condition=new
Best Answer
Trying
curl -I
on that URL returnsMethodNotAllowed
:and adding a
User-Agent
string with the-A
switch didn't effect that return value.You might experiment with different http headers to see if you can find something that passess. But it's pretty obvious that Amazon wouldn't want you to screen scrape prices from their product pages. And a little googling brings up this page:
http://www.distil.it/amazon-cracks-down-on-price-scraping/#.URvBFo4ry0s
Note also that Amazon has an API for their affiliates -- there are some related questions about using that API from python in the "Related" question links on the right column.