Google-sheets – Extracting / scraping specific information off a website programatically

google sheetsimporthtml

I have a public website link from which I need to extract a certain field programmatically.

https://echa.europa.eu/fr/registration-dossier/-/registered-dossier/14500

e.g. The section that says "Registrants /Suppliers of the substance"

Best Answer

the simplest thing you can use is Google Sheet - online tool on steroids based on MS Excel.

and all you need is to paste this into some cell:

=IMPORTHTML("https://echa.europa.eu/fr/registration-dossier/-/registered-dossier/14500"; 
            "TABLE"; 1)

demo spreadsheet: https://docs.google.com/spreadsheets/d/



*bold* fix:

as you can notice, the formula above imports data as is, with the partial formatting of previously bold parts of the text under B column. this can be easily eliminated with this formula which converts imported data straight on text with TO_TEXT and then REGEXREPLACE converts all "\*" to "" no space all wrapped in ARRAYFORMULA to ensure continuity.

=ARRAYFORMULA(REGEXREPLACE(TO_TEXT(
 IMPORTHTML("https://echa.europa.eu/fr/registration-dossier/-/registered-dossier/14500"; 
 "TABLE"; 1)); "\*"; ""))

demo spreadsheet of the fix: https://docs.google.com/spreadsheets/d/