I want to import a table of nuclear power output from https://ru.wikipedia.org/wiki/Атомная_энергетика_по_странам.
For some reason, the second col has the number duplicated, eg
<td><span style="display: none; speak: none;">000935</span>
935</td>
Is there a way to affect the work of IMPORTHTML() to get only the first child of <td>
?
Similarly, in the second table, last column, Google Sheet extracts this:
British Nuclear Fuels (англ.)русск.
I want to remove the texts (англ.)
and русск.
The text I need is again in the first child of <td>
:
<td><a href="//en.wikipedia.org/wiki/British_Nuclear_Fuels" class="extiw" title="en:British Nuclear Fuels">British Nuclear Fuels</a>
<span style="font-size:95%; position: relative; top: .4em"> <span class="ref-info" title="на английском языке" style="font-size:85%; cursor:help; color:#888;">(англ.)</span>
</span><span class="link-ru" style="font-size:80%; margin-left:-1.7em; position: relative; top: -.4em;"><a href="/w/index.php?title=British_Nuclear_Fuels&action=edit&redlink=1" class="new" title="British Nuclear Fuels (страница отсутствует)">русск.</a></span></td>
Also, I'd like to clean up bracketed references from col headings, eg [7]
Best Answer
| {{~|001627}}1 627
source cleaned halved check 101240102 709 101240102709 102709 FALSE
Using these formulas:
=REGEXREPLACE(TEXT(source,"0"),"^0+| ","")
=replace(cleaned,1,len(cleaned)/2,"")
=cleaned=concat(halved,halved)
A few values don't check out, but the difference between the two halves is small, so that's ok