This is the website:
https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=aapl&type=o-NavBar
I'm trying to pull out the following two pieces of data into a Google Sheet:
- Sector (GICSĀ®)
- Industry (GICSĀ®)
That is, I want the following two to show up for the above:
- Information Technology
- Technology Hardware, Storage & Peripherals
I've tried the usual techniques including:
- importData
- importHTML
- importXML (this gave an error).
The XPath's I've derived via Google's Inspect Element tool are:
//*[@id="companyProfile"]/div[8]/span/a
//*[@id="companyProfile"]/div[13]/span/a
Nothing has worked so far. How can I extract this data into a Google Sheet?
Best Answer
The command
IMPORTXML
ignores the nodes with no text content: for example, taking//div[3]
from the document where the body consists ofresults in "Fourth". So, when you count the
<div>
elements, skip over those where there is no text. The elements you are looking for are returned withand
For a more robust solution, I advise not relying on the numbering of elements at all. The following command returns both values you want with a single call, one under another:
You can apply
TRANSPOSE
to the result to put them side by side, if preferable. Or, if some custom positioning is needed, get one at a time withand
Note that one should use single quotes in an XPath command, since it's a string surrounded by double quotes.
Using Inspect Element tool is not an inherently bad idea (it shows a nice tree view of the document), but there is an important caveat: this tool shows the document after any JavaScript runs on the page, while
IMPORTXML
gets the source as it is before any JavaScript processing. This matters when some elements get added by a script (example in my answer here). To see exactly whatIMPORTXML
works with, use right-click -> "View Page Source" in Chrome, or its equivalent in other browsers.