Google Sheets – ImportHTML #VALUE! Error for Specific URL

google sheetsimporthtml

I'm trying to get the data from http://wodindex.wikispaces.com/ into my spreadsheets. Also from a different subdomain of wikispaces.com the grabbing doesn't work.

Trying to get data from this site gives me a #VALUE! error, while the same formula used with another URL (Wikipedia) does work.

What causes this error? Is this site somehow protected?

MWE: =ImportHTML("http://wodindex.wikispaces.com/";"list";1)

Best Answer

It seems wodindex.wikispaces.com attempts to store a cookie, and then do some fancy redirect tricks, for reasons I don't know. The functionality behind ImportHTML needs to work as a browser (or at least an HTTP client), and probably does not support cookies.

This is what I get from running wget http://wodindex.wikispaces.com:

--12:41:12--  http://wodindex.wikispaces.com/
           => `index.html'
Resolving wodindex.wikispaces.com... done.
Connecting to wodindex.wikispaces.com[75.126.104.177]:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://session.wikispaces.com/1/auth/auth?authToken=f7a1a3abdd9511c29392cf7000b27dd5 [fol
lowing]
--12:41:13--  https://session.wikispaces.com/1/auth/auth?authToken=f7a1a3abdd9511c29392cf7000b27dd5
           => `auth@authToken=f7a1a3abdd9511c29392cf7000b27dd5'
Resolving session.wikispaces.com... done.
Connecting to session.wikispaces.com[208.43.192.33]:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://wodindex.wikispaces.com/?responseToken=f7a1a3abdd9511c29392cf7000b27dd5 [following]

--12:41:14--  http://wodindex.wikispaces.com/?responseToken=f7a1a3abdd9511c29392cf7000b27dd5
           => `index.html@responseToken=f7a1a3abdd9511c29392cf7000b27dd5'
Connecting to wodindex.wikispaces.com[75.126.104.177]:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://wodindex.wikispaces.com/ [following]
http://wodindex.wikispaces.com/: Redirection cycle detected.

C:\Users\viramd>wget http://wodindex.wikispaces.com/
--12:42:28--  http://wodindex.wikispaces.com/
           => `index.html'
Resolving wodindex.wikispaces.com... done.
Connecting to wodindex.wikispaces.com[75.126.104.177]:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://session.wikispaces.com/1/auth/auth?authToken=2141639d8901c291dc288a940c9609e8 [fol
lowing]
--12:42:28--  https://session.wikispaces.com/1/auth/auth?authToken=2141639d8901c291dc288a940c9609e8
           => `auth@authToken=2141639d8901c291dc288a940c9609e8'
Resolving session.wikispaces.com... done.
Connecting to session.wikispaces.com[208.43.192.33]:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://wodindex.wikispaces.com/?responseToken=2141639d8901c291dc288a940c9609e8 [following]

--12:42:29--  http://wodindex.wikispaces.com/?responseToken=2141639d8901c291dc288a940c9609e8
           => `index.html@responseToken=2141639d8901c291dc288a940c9609e8'
Connecting to wodindex.wikispaces.com[75.126.104.177]:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://wodindex.wikispaces.com/ [following]
http://wodindex.wikispaces.com/: Redirection cycle detected.

As can be seen here, the request is redirected to session.wikispaces.com, which has some kind of redirect loop, that can never end.

One 'solution' to your problem is to save the contents of wodindex.wikispaces.com to another web server, and fetch it from there. But I'm not sure that would be entirely legal.