I am making a spreadsheet which requires me to have the number of amazon reviews of a product. I looked up how to dynamically update the reviews as more came in and came across the IMPORTXML
formula.
In this specific example I can not find out how to get the information on the sheet as I keep getting the error "imported xml content can not be parsed."
My current formula is =IMPORTXML("http://www. amazon.com/Optoma-GT1080-1080p-Gaming-Projector/product-reviews/B00M9D4CAK/","//*[@id='cm_cr-product_info']/div/div[1]/div[2]/span")
(Space inserted before "amazon" to avoid the autoconversion of the link).
This is the piece i want to get.
Best Answer
Not every webpage can be scraped with
importXML
. There are several possible issues:importXML
.importXML
tries to sort things out, but in the presence of a lot of non-XML-compliant content (e.g., massive inlined scripts served by Amazon) it fails.With Amazon you are facing the second and third obstacles. A way to get around the second issue is to use
UrlFetchApp.fetch(url).getContentText()
from Apps Script. This gets the entire source of the page, which you can then parse in any way you want, e.g., with regular expressions.One has to be careful with scraping Amazon. When experimenting, I sometimes got the following message instead of the page content.
But here is an Apps Script function that grabs the number of reviews of this product and inserts it into cell A1 of the current sheet.
For all I know, this may still be against Amazon's TOS. Related: Is there an Amazon.com API to retrieve product reviews?