Google-sheets – IMPORTXML Google Spreadsheet Amazon review numbers

amazongoogle sheetsimportxmlxml

I am making a spreadsheet which requires me to have the number of amazon reviews of a product. I looked up how to dynamically update the reviews as more came in and came across the IMPORTXML formula.

In this specific example I can not find out how to get the information on the sheet as I keep getting the error "imported xml content can not be parsed."

My current formula is =IMPORTXML("http://www. amazon.com/Optoma-GT1080-1080p-Gaming-Projector/product-reviews/B00M9D4CAK/","//*[@id='cm_cr-product_info']/div/div[1]/div[2]/span")

(Space inserted before "amazon" to avoid the autoconversion of the link).

This is the piece i want to get.

this is what i'm trying to get

Best Answer

Not every webpage can be scraped with importXML. There are several possible issues:

Some content on the page may be generated with JavaScript, which is not executed by importXML.
The page may be invalid as XML file. A typical HTML page is invalid XML (unless it is an XTHML page). The parser of importXML tries to sort things out, but in the presence of a lot of non-XML-compliant content (e.g., massive inlined scripts served by Amazon) it fails.
The site may forbid bots from accessing certain content.

With Amazon you are facing the second and third obstacles. A way to get around the second issue is to use UrlFetchApp.fetch(url).getContentText() from Apps Script. This gets the entire source of the page, which you can then parse in any way you want, e.g., with regular expressions.

One has to be careful with scraping Amazon. When experimenting, I sometimes got the following message instead of the page content.

To discuss automated access to Amazon data please contact api-services-support@amazon.com.

But here is an Apps Script function that grabs the number of reviews of this product and inserts it into cell A1 of the current sheet.

function reviewCount() {   // remove space before amazon on the next line 
  var url = "http://www. amazon.com/Optoma-GT1080-1080p-Gaming-Projector/product-reviews/B00M9D4CAK/";
  var content = UrlFetchApp.fetch(url).getContentText();
  var match = /totalReviewCount">([0-9, ]+)/.exec(content);
  SpreadsheetApp.getActiveSheet().getRange("A1").setValue(match[1]);
}

For all I know, this may still be against Amazon's TOS. Related: Is there an Amazon.com API to retrieve product reviews?

Related Solutions

Amazon – What Kind of Links Are Accepted in Reviews

According to Amazon, they do not allow outside links in reviews. If your link would lead the reader away from the Amazon website, then it will be deleted or "Unlinked".

However, they do allow you to link to products on the Amazon website by using the ASIN number found on the product page.

The link must be formatted as follows:

[[ASIN:asin text for link]]

The ASIN number is found on the product page, usually where it lists product weight and other stats. Simply place that number within the brackets.

Example:

[[ASIN:B07WWWHSSC]]

Google Sheets – ImportXML Strips Leading Zeros Solution

With the formula TEXT(number, format) we can force the display of zeros.

In my case the formula is now: =TEXT(IMPORTXML(linkOfData,xpath), "000000000")

Best Answer

Related Solutions

Amazon – What Kind of Links Are Accepted in Reviews

Google Sheets – ImportXML Strips Leading Zeros Solution

Related Topic