Google-sheets – How does Google know that I am not a human

google apigoogle sheets

Over on Stack Overflow I asked a question regarding accessing Google Spreadsheets with CURL, which doesn't seem to be possible any more, even when setting a valid user agent string. My question arising from that finding is the following:

How does Google know that my request sent with CURL via a valid user agent string does not originate from a browser?

Best Answer

Without asking Google, there's really no way to know for sure. However, aside from the user agent, they could be looking for certain commonly used HTTP headers that are routinely sent by web browsers but normally omitted in curl requests. This may include, but not be limited to:

cookies - saved cookies, which may include session or login credentials
referer - the page you came from
accept - the type of content you can handle
accept-language - the language(s) you can read
accept-encoding - the data encodings you support
caching-related headers
security-related headers
others...

These headers are optional in most circumstances, but sending them helps a server give the best possible response, so browsers normally send them. A clever site could use the lack of common headers as a way to tell the difference between a client using a normal browser and using a downloader like curl. Especially a site like Google, which gets enough traffic to have a pretty good idea of what kind of behavior to expect from most popular browsers.

Having said that, I personally doubt Google Sheets is trying to block curl; rather, there's probably a necessary header that is being left out on the curl request. You can always use the --header option of curl to add any necessary headers to the curl request in order to better mimic real-world browser behavior.

Formula

=MATCH(MAX(FILTER(B122:U122, B122:U122<0)),B122:U122, 0)

Explained

The third parameter of the MATCH will tell it to find an exact match, as stated in the help:

0 indicates exact match, and is required in situations where range is not sorted.

By default it is set at 1, if no value is given. This will yield a completely different result.

Reference

New Google Sheets Download URL and Errors Explained

OK, some more searching and I found enough hints‡ to find something that works, whether it is officially documented this way, I have no idea. I would love to know where the documentation might be.

https://docs.google.com/spreadsheets/d/ID/export?format=tsv&gid=GID

The ID is what you would expect, the ID of the sheet's key from the URL. But the other surprising thing with Google Sheets, is that the GID no longer predictably starts at 0 for the first sheet, and increments for subsequent sheets. Whether this is supposed to add security or is an accident of the implementation, or an especially useful design for some internal reason, I do not know... but the GID numbers on a 7 sheet Sheet I just uploaded from XLS format are all large, and appear random.

While a particular sheet from a given spreadsheet may easily be determined from the URL displayed while viewing that Sheet in the browser, it looks like the knowledge that "all Sheets in this folder contain only one sheet" is insufficient to be able to download them, and the ability to "download the first sheet from each of these Sheets" is eliminated, unless the GIDs can be determined.

In fact, it seems my whole problem was the assumption of the first gid being zero, which it always has been for old Sheets... the URL form in this answer seems to be closest to what is used for editing new Sheets in the browser, but both forms of URL in the original question also work, if the proper GID is used.

Bonus points for a technique to determine the GID... based on what I thought I knew about old Sheets, I was planning for a system of 8000+ Sheets, with 4-6 sheets each, and thinking I could download them, one sheet of each Sheet at a time, by iterating through low-numbered GID values. KaBLAM goes that design, with the non-predictable GID values!

‡ The hints start here (but this solution seems excessively lengthy) and proceed to the here and here.

Best Answer

Related Solutions

Google-sheets – MATCH function does not work

Formula

Explained

Reference

New Google Sheets Download URL and Errors Explained

Related Topic