The solution was to select all the cells (I can't find a Select All
menu item, but on OS X Command-A worked), then choose Copy
, then Paste special/Paste values only
. Bingo: all the cells whose values depend on the ImportHTML()
in A1 now have their values pasted.
The command IMPORTXML
ignores the nodes with no text content: for example, taking //div[3]
from the document where the body consists of
<div>First</div> <div>Second</div> <div></div> <div>Fourth</div>
results in "Fourth". So, when you count the <div>
elements, skip over those where there is no text. The elements you are looking for are returned with
=IMPORTXML( url , "//div[@id='companyProfile']/div[4]/span")
and
=IMPORTXML( url , "//div[@id='companyProfile']/div[5]/span")
For a more robust solution, I advise not relying on the numbering of elements at all. The following command returns both values you want with a single call, one under another:
=IMPORTXML( url , "//div[@id='companyProfile']/div[@class='sub-heading']/span")
You can apply TRANSPOSE
to the result to put them side by side, if preferable. Or, if some custom positioning is needed, get one at a time with
=IMPORTXML( url , "//div[@id='companyProfile']/div[@class='sub-heading'][1]/span")
and
=IMPORTXML( url , "//div[@id='companyProfile']/div[@class='sub-heading'][2]/span")
Note that one should use single quotes in an XPath command, since it's a string surrounded by double quotes.
Using Inspect Element tool is not an inherently bad idea (it shows a nice tree view of the document), but there is an important caveat: this tool shows the document after any JavaScript runs on the page, while IMPORTXML
gets the source as it is before any JavaScript processing. This matters when some elements get added by a script (example in my answer here). To see exactly what IMPORTXML
works with, use right-click -> "View Page Source" in Chrome, or its equivalent in other browsers.
Best Answer
the simplest thing you can use is Google Sheet - online tool on steroids based on MS Excel.
and all you need is to paste this into some cell:
demo spreadsheet: https://docs.google.com/spreadsheets/d/
*
bold*
fix:as you can notice, the formula above imports data as is, with the partial formatting of previously bold parts of the text under B column. this can be easily eliminated with this formula which converts imported data straight on text with
TO_TEXT
and thenREGEXREPLACE
converts all"\*"
to""
no space all wrapped inARRAYFORMULA
to ensure continuity.demo spreadsheet of the fix: https://docs.google.com/spreadsheets/d/