Using Google Sheets with IMPORTXML
to scrape a piece of information from a website via XPath
.
The specific piece of information I'm trying to extract is the Price/Sales (TTM) number, i.e., 2.97
Here's the XPath (I've tested it using the Google Chrome Extensions XPath Helper and XPather. It works.):
//div[@id='audit-integrity'][1]/table/tbody/tr[10]/td[2]
Here's my Google Finance function:
=IMPORTXML( "https://eresearch.fidelity.com/eresearch/evaluate/fundamentals/keyStatistics.jhtml?stockspage=keyStatistics&symbols=aapl" , "//div[@id='audit-integrity'][1]/table/tbody/tr[10]/td[2]")
The function doesn't work. Why?
Best Answer
It's because of
<tbody>
. There is no<tbody>
in the HTML source. Browser puts it there (on its own; nothing to do with JavaScript on the page) because it believes it should be there, according to HTML spec.Just remove
/tbody
from the path:And in general: when XPath doesn't work, debug by removing all of the selectors, then adding one by one until the formula breaks...