Google Sheets – How to Determine the Xpath Query for IMPORTXML

google sheetsimportxmlxpath

I'm looking to retrieve the most recent financial figure for a single data point from a table using IMPORTXML.

The target page displays three sets of financial figures for a company on quarterly and annual bases. In this case, I only want the "Total Debt" figure listed from the balance sheet for whatever is the most recent fiscal period available.

I've used the following IMPORTXML formula:

=IMPORTXML("https://www.google.com/finance?q=GOOG&fstype=ii","//td[contains(.,'Total Debt')]")

This results in only the data label displayed vertically in two adjacent cells.

Total Debt
Total Debt

I've also used this IMPORTXML formula with the Xpath query determined using Chrome:

=IMPORTXML("https://www.google.com/finance?q=GOOG&fstype=ii","//*[@id='fs-table']/tbody/tr[27]/td[2]")

This results in four values (data from two financial statements for two periods each) displayed vertically, as below, in adjacent cells with the third value down being correct. Per this answer to another discussion, I've tried removing the "tbody" element node resulting in an "#N/A" error: "Imported content is empty."

4,207.00
15,826.00
5,208.00
5,220.00

For now, I'm trying to avoid using IMPORTHTML and INDEX for parsing a whole table since I need only a single value from the table.

How do I determine the Xpath query for this page to retrieve (1) the "Total Debt" figure for (2) always the most recent reporting period?

EDIT: Because there are two elements with same name, "Total Debt," I had also tried using the below formula with and without predicates (appending 1, 2, [last], etc. in square brackets) and got returned an error with empty content.

=IMPORTXML("https://www.google.com/finance?q=GOOG&fstype=ii","//*[local-name() = 'Total Debt'][1]")

Best Answer

Short answer

AFAIK, regarding XPath queries to be used with IMPORTXML there isn't straightforward method as XPath 1.0 support looks that was not fully implemented and the web pages developers could follow the practices to set the structure of their webpages.

Explanation

While the use of tools like Chrome Developer Tools or browser extensions/add-ons could be helpful sometimes these tools doesn't return a XPath query that could be used by IMPORTXML due to differences on how XPath support was implemented by the developers of each tool, by the other hand, web pages could comply or not with the XML rules, so to find the XPath query to be used with IMPORTXML could be necessary to analyze the structure of the source web page and to do several tries.

XPath queries for the use case

The below XPath queries returns 5,208.00

1.

//div[@id="balinterimdiv"]//tr[contains(.,'Total Debt')]/td[2]

2.

(//tr[contains(.,'Total Debt')]/td[2])[1]

Explanation

The referred page includes two views for the Balance Sheet: Quarterly Data and Annual Data. Both of them looks to have the same structure as both includes a table cell (td tag) with the text Total Debt. Fortunately, each view are inside a div tag and each of them have their own id, so in order to get only one, the first step in the XPath query could be to select the right view, then the second step could be to select the right table row (tr tag) and the third step to select the right table cell (td tag).

Another approach is to use the construct (xpath_query)[position() = 1] (see the reference).

References