Google-sheets – How does Google know that I am not a human

google apigoogle sheets

Over on Stack Overflow I asked a question regarding accessing Google Spreadsheets with CURL, which doesn't seem to be possible any more, even when setting a valid user agent string. My question arising from that finding is the following:

How does Google know that my request sent with CURL via a valid user agent string does not originate from a browser?

Best Answer

Without asking Google, there's really no way to know for sure. However, aside from the user agent, they could be looking for certain commonly used HTTP headers that are routinely sent by web browsers but normally omitted in curl requests. This may include, but not be limited to:

  • cookies - saved cookies, which may include session or login credentials
  • referer - the page you came from
  • accept - the type of content you can handle
  • accept-language - the language(s) you can read
  • accept-encoding - the data encodings you support
  • caching-related headers
  • security-related headers
  • others...

These headers are optional in most circumstances, but sending them helps a server give the best possible response, so browsers normally send them. A clever site could use the lack of common headers as a way to tell the difference between a client using a normal browser and using a downloader like curl. Especially a site like Google, which gets enough traffic to have a pretty good idea of what kind of behavior to expect from most popular browsers.

Having said that, I personally doubt Google Sheets is trying to block curl; rather, there's probably a necessary header that is being left out on the curl request. You can always use the --header option of curl to add any necessary headers to the curl request in order to better mimic real-world browser behavior.