Google Spreadsheets has an IMPORTXML function, which I am trying to use to maintain a list of TV episodes, based on the API at http://services.tvrage.com/feeds/episode_list.php?sid=3183.
I am trying to get a table of Date | Season No | Episode No | Title
from that XML.
The XML is hierarchical, nesting episodes under seasons.
I can get my date, episode and title columns, using xpath:
- Date –
/Show/Episodelist/Season/episode/airdate
- Episode No –
/Show/Episodelist/Season/episode/seasonnum
- Title –
/Show/Episodelist/Season/episode/title
However, I cannot get seem to find the right xpath to extract the Season number for each episode entry, which is the no
attribute to the Season
parent node.
I have tried:
/Show/Episodelist/Season/episode/../@no
/Show/Episodelist/Season/episode/parent::Season/@no
Both of these result in getting a distinct list of seasons, i.e.:
Date Season Episode Title
01-01-2001 1 1 foo
02-01-2001 2 2 bar
03-01-2001 3 3 baz
04-01-2001 4 fee
05-01-2001 5 fob
01-03-2002 1 bix
02-03-2002 2 buz
03-03-2002 3 fez
04-03-2002 4 baj
…whereas I am looking for:
Date Season Episode Title
01-01-2001 1 1 foo
02-01-2001 1 2 bar
03-01-2001 1 3 baz
04-01-2001 1 4 fee
05-01-2001 1 5 fob
01-03-2002 2 1 bix
02-03-2002 2 2 buz
03-03-2002 2 3 fez
04-03-2002 2 4 baj
What is the correct xpath so that the IMPORTXML routine does not do a distinct operation on the imported data?
Best Answer
As stated in my comment, doing this with
=importXML
and XPath seems tricky.But doing it programmatically, using the XmlService of Google Apps Script, looks easier.
I have written the following:
In effect, it fetches the content from the given URL, parses it as XML, and iterates through the elements, extracting values of the elements you wanted, which is stored in a 2-dimensional array (
result
), and returned.To use this, you must install the script in your spreadsheet. Click the Tools menu → Script editor, and paste the code listed above. Save the script.
In the cell where you want the output, enter
The result should be a list of episodes, with the following columns:
I have set up an example spreadsheet to demonstrate this, feel free to copy it.