Google cloud storage public object cached on the server side

google-cloud-storagehttp-headers

I have a Google bucket with read permission for allUsers, but it's not configured as a website (and no archiving). I experience an object caching even when I send requests with no-cache control:

gsutil cp test gs://mybucket
# test has default meta Cache-Control: public, max-age=3600
wget -S --no-cache http://storage.googleapis.com/mybucket/test
# OK saved as test
gsutil rm gs://mybucket/test
wget -S --no-cache http://storage.googleapis.com/mybucket/test
# Saved as test.1, why?

I have run the wget --no-cache after the object removal several times. It has sometimes returned the cached test file, sometimes properly HTTP 404. I have run the the commands from a Google Compute Engine Ubuntu server, with no cache configured. I've got the same results from a few machines outside of Google Cloud.

IMO the server has to return HTTP 404 always. Is there a bug in Google Cloud infrastructure?

Note: when I set the object meta Cache-Control:no-cache it works as expected. But I think the server should never return the cached content for the wget --no-cache even when it has the default meta Cache-Control:no-cache

Best Answer

You are correct. Google Cloud Storage currently ignores anonymous client requests to skip the cache.

You can get around this by explicitly setting a different cache-control policy on the object, by requesting a specific generation of an object, or by making authorized requests.

Related Solutions

Why, sometimes the full content is returned unchanged, while If-Modified-Since is set

Can you use Firebug and paste the HTTP response headers. I have a feeling that even though in your code you are setting cache control to Public, Apache is overriding that because for Php File type you are setting Cache-Control to private.

One thing you can do it remove the cache settings for Dynamic Pages from Apache configuration. That should fix the problem because a Proxy doesn't cache a response without the correct headers.

EDIT

Hi Sam, Revisiting you question I found the solution to the problem. The following code snippet appears to be a problem. In the output of your Php the Last-Modified header always changes and when a browser sends a 304 If modified request it see a change and therefore re-requests that content.

header ("Last-Modified: " . gmdate("D, d M Y H:i:s", time() - 404800000)." GMT");

Unset Last-Modified and ETags from your content to speed up the website. This site provides some excellent tips too.
http://www.askapache.com/htaccess/apache-speed-last-modified.html

Can Apache2 be configured to return both HTTP 1.0 and 1.1

It's certainly possible a web server could respond with both HTTP 1.0 and HTTP 1.1. For example. You can use web-sniffer.net to confirm this. Test sending a "HTTP 1.0" request to Google.com, and you'll get a 1.0 response back. Test with HTTP 1.1, and you'll get an HTTP 1.1 response back.

If you are setting headers that declare that your content is cachable, it's quite fair that network appliances and users browsers caching as you have declared it's allowed.

A solution to consider is to set the cache times to be low (or off) while you are making changes, so that changes are reflected right away. Then when you are done, turn caching back on, or raise the cache times.

Related Topic