Google-sheets – ImportHTML Altering Data Format/URL Fetch Issue

formulasgoogle sheetsgoogle-apps-scriptimporthtml

4/17 6PM Update *** I am now trying to use the URL Fetch App. I've been playing with this for hours, but can't seem to figure out the issue. I am just trying to get the info from this site as plain text (so the score doesn't mess up) in a Google Sheet.

function myFunction() {
    var response = UrlFetchApp.fetch("https://www.basketball-reference.com/boxscores/pbp/202003110MIA.html");
 Logger.log(response.getContentText());
   var doc = SpreadsheetApp.getActiveSpreadsheet();
   var cell = doc.getRange('a1');
}

I'm using ImportHTML to export NBA play by play data.

For some reason, random cells containing the score (formatted 12-10 as an example) transfer incorrectly as long numbers (43864 as an example).

I've tried various formatting options that are offered & I've looked at different games as well, but don't see any patterns as to why certain scores work and others don't.

Lastly, when I copy and paste the info directly from the site it works flawlessly, but when you paste as values the score becomes incorrect — I'm assuming that is what ImportHTML is doing.

Could someone possibly explain to me why this is happening? Is there a way to make sure all scores stay correctly formatted?

For reference, I'm building a script that will take all of this information, and transform it so that I can track different events for players based off of time. This has been my one little hiccup I can't seem to get past.

I've attached screenshots of the error in Sheets, along with the information as it is on the website.enter image description here

enter image description here

Best Answer

Tl;Dr IMPORTHTML and IMPORTXML assigns the data type before adding the values to the spreadsheet. Numbers like 43863 are serial numbers used to represent dates, time and durations.

Google Sheets

  • automatically try to detect the data type of the value entere on each cell based on the spreadsheet regional setting.
  • uses serial numbers to handle dates and durations
  • IMPORTHTML is a function that is calculated on the server instead of being calculated on the client (web browser)

You could try to build a complex formula (see Trying to use Google Sheets importHTML() to import a table. It forces content to a date format) but as you already are building a script it's very likely that using the script to scrap the web content will be better as you will have more control about how the content is parsed.