Google Sheets – Extracting GICS Codes from Fidelity Website

google sheetsimportdataimporthtmlimportxmlxpath

This is the website:

https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=aapl&type=o-NavBar

I'm trying to pull out the following two pieces of data into a Google Sheet:

Sector (GICS®)
Industry (GICS®)

That is, I want the following two to show up for the above:

Information Technology
Technology Hardware, Storage & Peripherals

I've tried the usual techniques including:

importData
importHTML
importXML (this gave an error).

The XPath's I've derived via Google's Inspect Element tool are:

//*[@id="companyProfile"]/div[8]/span/a
//*[@id="companyProfile"]/div[13]/span/a

Nothing has worked so far. How can I extract this data into a Google Sheet?

Best Answer

The command IMPORTXML ignores the nodes with no text content: for example, taking //div[3] from the document where the body consists of

<div>First</div> <div>Second</div> <div></div> <div>Fourth</div>

results in "Fourth". So, when you count the <div> elements, skip over those where there is no text. The elements you are looking for are returned with

=IMPORTXML( url , "//div[@id='companyProfile']/div[4]/span")

and

=IMPORTXML( url , "//div[@id='companyProfile']/div[5]/span")

For a more robust solution, I advise not relying on the numbering of elements at all. The following command returns both values you want with a single call, one under another:

=IMPORTXML( url , "//div[@id='companyProfile']/div[@class='sub-heading']/span")

You can apply TRANSPOSE to the result to put them side by side, if preferable. Or, if some custom positioning is needed, get one at a time with

=IMPORTXML( url , "//div[@id='companyProfile']/div[@class='sub-heading'][1]/span")

and

=IMPORTXML( url , "//div[@id='companyProfile']/div[@class='sub-heading'][2]/span")

Note that one should use single quotes in an XPath command, since it's a string surrounded by double quotes.

Using Inspect Element tool is not an inherently bad idea (it shows a nice tree view of the document), but there is an important caveat: this tool shows the document after any JavaScript runs on the page, while IMPORTXML gets the source as it is before any JavaScript processing. This matters when some elements get added by a script (example in my answer here). To see exactly what IMPORTXML works with, use right-click -> "View Page Source" in Chrome, or its equivalent in other browsers.

Short answer

AFAIK, regarding XPath queries to be used with IMPORTXML there isn't straightforward method as XPath 1.0 support looks that was not fully implemented and the web pages developers could follow the practices to set the structure of their webpages.

Explanation

While the use of tools like Chrome Developer Tools or browser extensions/add-ons could be helpful sometimes these tools doesn't return a XPath query that could be used by IMPORTXML due to differences on how XPath support was implemented by the developers of each tool, by the other hand, web pages could comply or not with the XML rules, so to find the XPath query to be used with IMPORTXML could be necessary to analyze the structure of the source web page and to do several tries.

XPath queries for the use case

The below XPath queries returns 5,208.00

//div[@id="balinterimdiv"]//tr[contains(.,'Total Debt')]/td[2]

(//tr[contains(.,'Total Debt')]/td[2])[1]

Explanation

The referred page includes two views for the Balance Sheet: Quarterly Data and Annual Data. Both of them looks to have the same structure as both includes a table cell (td tag) with the text Total Debt. Fortunately, each view are inside a div tag and each of them have their own id, so in order to get only one, the first step in the XPath query could be to select the right view, then the second step could be to select the right table row (tr tag) and the third step to select the right table cell (td tag).

Another approach is to use the construct (xpath_query)[position() = 1] (see the reference).

References

Answer by Dimitre Novatchev to What is the XPath expression to find only the first occurrence? referred by Dale in a comment to another answer to the question in this thread.

Best Answer

Related Solutions

Google-sheets – Import current price from website to spreadsheet

Google Sheets – How to Determine the Xpath Query for IMPORTXML

Short answer

Explanation

XPath queries for the use case

Explanation

References

Related Topic