How does Squid know if it’s cache is is validated

reverse-proxysquid

In reverse proxy mode, Squid can cache content from websites previously accessed by devices within the network.

What happens if the content on the remote site changes in someway, perhaps by a code push? How does Squid know it needs to go to the original site to get a new version of the asset, instead of it's cache?

Is this more of an issue now with dynamic javascript-based (single page) sites?

A side question: Is "reverse proxy" is essentially the same thing as "accelerator mode" for Squid?

Best Answer

Yes, Squid will cache the reponses from the back-end server(s) using the conventional method of interpreting the headers the back-end server sends with each response.

A typical response for dynamic contents that shouldn't be cached looks something like:

Expires: Fri Jul 25 10:19:36 CEST 2014 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache

Technically each of those headers by themselves is already sufficient to declare the contents of the response dynamic, but conventional wisdom seems to still use them all. Cargo cult programming or backwards compatibility?

Cache-Control is the header you should be most concerned with. These are the caching instructions for both your Squid reverse proxy, as well as any intermediate caching proxy servers up to and including the actual browser. The options are:

  • private or public ; a private response is specific to a user and shouldn't be cached, a public response may be cached.
  • no-cache does mostly what it sounds like and is an instruction to re-validate the resource for each subsequent request. Although after validation proved the resource is still valid a cached response could still be served.
  • no-store a clear instruction that the response must be treated as confidential and not stored at all, a bit stronger than the no-cache option above.
  • max-age in seconds overrides the Expires header and instructs when an asset is expired and should be purged from the cache.
    • s-maxage in seconds the same as above, but for shared caches like content delivery networks.

Expires is the classical way of setting cache instruction, with a simple time-stamp no more than 1 year in the future.

Pragma is a really old-school header, setting it to no-cache will be interpreted by any recent browser as Cache-Control: no-cache and I think it is no longer present in the more recent HTTP protocol specifications although still honoured for historical back-ward compatibility.

The headers set for more static content should instruct Squid (as well as the web browsers of your visitors) that those responses can be cached.

Cache-Control: no-transform,public,max-age=300,s-maxage=900
Content-Type: text/html; charset=UTF-8
Date: Fri Jul 25 10:19:36 CEST 2014 GMT
Expires: Sat Jul 26 10:19:36 CEST 2014 GMT

The problem is that unless you manually flush the Squid cache contents, objects will be stored for the duration of their cache-control headers. Squid doesn't have provisions like you find in Varnish or the software CDN's use to honour PURGE requests to invalidate specific cached objects.

The work-around is to have you content management solution ensure that updates to static contents come with new filenames, rather than overwriting existing files.

Of course your local configuration can override the instructions set in the headers.

And yes in Squid context a reverse proxy and web accelerator are the same thing.

Related Topic