New Google Sheets Download URL and Errors Explained

google sheets

I've been downloading the first sheet of old Google Spreadsheets as a TSV file using URLs like:

 https://docs.google.com/feeds/download/spreadsheets/Export?key=ID&exportFormat=tsv&gid=0

After creating some new Google Sheets, I notice that the URL format in exportLinks has changed to this sort of thing (could only see PDF and XSLX formats though):

 https://docs.google.com/spreadsheets/export?id=ID&exportFormat=tsv&gid=0

But while either URL format works fine for the old Spreadsheets, neither of them works for the new Google Sheets… instead I get an error about this not being a P3P policy.

So can new Sheets only be downloaded using some new protocol surrounding the URL?

That would be sad.

I'm using the Python 2 Google API (because there hasn't been a Python 3 version).

Does it need to be upgraded?

Is there a Python 3 version?

Why does a P3P policy error show up?

What question should I be asking?

The Google Sheets I'm trying to download was created by me, although it is shared with others. I can export it from the menu in the browser, but can't grab it from my program. Would surely like to download data from the new Sheets, as well as the old Spreadsheets.

Here are my imports:

from apiclient import errors
from apiclient.discovery import build
from apiclient.http import MediaFileUpload
from oauth2client.tools import run
from oauth2client.file import Storage
from oauth2client.client import OAuth2WebServerFlow

Edit:
While I am using the Google Python library, additional testing using only the browser inspired by the first answer received from pnuts indicates that the same URLs have the same problems direct from an appropriately logged in browser, so the age of the Python library doesn't seem to be the issue.

Best Answer

OK, some more searching and I found enough hints‡ to find something that works, whether it is officially documented this way, I have no idea. I would love to know where the documentation might be.

https://docs.google.com/spreadsheets/d/ID/export?format=tsv&gid=GID

The ID is what you would expect, the ID of the sheet's key from the URL. But the other surprising thing with Google Sheets, is that the GID no longer predictably starts at 0 for the first sheet, and increments for subsequent sheets. Whether this is supposed to add security or is an accident of the implementation, or an especially useful design for some internal reason, I do not know... but the GID numbers on a 7 sheet Sheet I just uploaded from XLS format are all large, and appear random.

While a particular sheet from a given spreadsheet may easily be determined from the URL displayed while viewing that Sheet in the browser, it looks like the knowledge that "all Sheets in this folder contain only one sheet" is insufficient to be able to download them, and the ability to "download the first sheet from each of these Sheets" is eliminated, unless the GIDs can be determined.

In fact, it seems my whole problem was the assumption of the first gid being zero, which it always has been for old Sheets... the URL form in this answer seems to be closest to what is used for editing new Sheets in the browser, but both forms of URL in the original question also work, if the proper GID is used.

Bonus points for a technique to determine the GID... based on what I thought I knew about old Sheets, I was planning for a system of 8000+ Sheets, with 4-6 sheets each, and thinking I could download them, one sheet of each Sheet at a time, by iterating through low-numbered GID values. KaBLAM goes that design, with the non-predictable GID values!

‡ The hints start here (but this solution seems excessively lengthy) and proceed to the here and here.

Best Answer

Related Solutions

Google-sheets – “the key” in the Google Sheets URL

Google Sheets Links – How to Create a URL for a Specific Cell in Google Sheets

Related Topic