XPath – Get Non-Distinct Parent Node Data for Each Child Node

Google Spreadsheets has an IMPORTXML function, which I am trying to use to maintain a list of TV episodes, based on the API at http://services.tvrage.com/feeds/episode_list.php?sid=3183.

I am trying to get a table of Date | Season No | Episode No | Title from that XML.

The XML is hierarchical, nesting episodes under seasons.

I can get my date, episode and title columns, using xpath:

Date – /Show/Episodelist/Season/episode/airdate
Episode No – /Show/Episodelist/Season/episode/seasonnum
Title – /Show/Episodelist/Season/episode/title

However, I cannot get seem to find the right xpath to extract the Season number for each episode entry, which is the no attribute to the Season parent node.

I have tried:

/Show/Episodelist/Season/episode/../@no
/Show/Episodelist/Season/episode/parent::Season/@no

Both of these result in getting a distinct list of seasons, i.e.:

Date         Season  Episode  Title
01-01-2001   1       1        foo
02-01-2001   2       2        bar
03-01-2001   3       3        baz
04-01-2001           4        fee
05-01-2001           5        fob
01-03-2002           1        bix
02-03-2002           2        buz
03-03-2002           3        fez
04-03-2002           4        baj

…whereas I am looking for:

Date         Season  Episode  Title
01-01-2001   1       1        foo
02-01-2001   1       2        bar
03-01-2001   1       3        baz
04-01-2001   1       4        fee
05-01-2001   1       5        fob
01-03-2002   2       1        bix
02-03-2002   2       2        buz
03-03-2002   2       3        fez
04-03-2002   2       4        baj

What is the correct xpath so that the IMPORTXML routine does not do a distinct operation on the imported data?

Best Answer

As stated in my comment, doing this with =importXML and XPath seems tricky.

But doing it programmatically, using the XmlService of Google Apps Script, looks easier.

I have written the following:

In effect, it fetches the content from the given URL, parses it as XML, and iterates through the elements, extracting values of the elements you wanted, which is stored in a 2-dimensional array (result), and returned.

To use this, you must install the script in your spreadsheet. Click the Tools menu → Script editor, and paste the code listed above. Save the script.

In the cell where you want the output, enter

=parseTvRageXml("http://services.tvrage.com/feeds/episode_list.php?sid=3183")

The result should be a list of episodes, with the following columns:

Season number | epnum | seasonnum | airdate | link | title

I have set up an example spreadsheet to demonstrate this, feel free to copy it.

Best Answer

Related Solutions

Google Sheets – Create Sparkline for Delta Between Columns

Google Sheets – Report Date/Column Header for First and Last Non-Zero Entries