Using 304 Not Modified and If-Modified-Since for REST API Caching

cachingrest

I have the following scenario: a client app that consumes a REST API. The client is a mobile application. Therefore, it caches the API data for bandwidth saving and offline support.

With REST it can be done using the If-Modified-Since header. It is, the cache management system from the client side stores the HTTP-Date and send it as a Header within the request to the API, in which, is made a check and happen to send a '304 Not Modified' status in case the resource haven't been modified since the specified data.

The "If-Modified-Since" header field makes a GET or HEAD request
method conditional on the selected representation's modification date
being more recent than the date provided in the field-value.
Transfer of the selected representation's data is avoided if that
data has not changed
(rfc7232#section-3.3 If-Modified-Since).

Assuming that, by "resources", it could be either a collection or a single record, the modified date would be the resource modified date (i.e., if it's a collection of records, the date would be the entire collection modified date). So, internally it'd make a database query to check if the record or the collection was modified since then and produce a proper response.

So far, this is how I intend to implement the caching strategy, as much REST as I can.

Problem is, it's a high impact design decision, and I'm not sure if it's a good approach or if there's a better way of implementing this. All the idea is based on logic and a little REST knowledge and I couldn't find any resource on how to implement this.

I'd like some critique on this model, whether it's good enough, have a better approach or even some caveats to improve the current idea, would be all appreciated.

Best Answer

If-Modified-Since works well for single resources where it is quick and easy to determine if the resource has been modified. For example, HTTP servers have this functionality baked in for static files since it can use the file's timestamp. The general concept is that processing the resource is more expensive than determining it's modification date. I would consider it for single resources, but not for collections.

There are several caching schemes to choose from, and you should only use the most appropriate one to solve the need. In some cases, caching is detrimental. Examples:

  • The results are bound to change each request (i.e. high rate of change)
  • Proxies can break caching schemes that are user specific (i.e. an inbox scenario)
  • The cost of determining if the item is newer is the same as retrieving it (i.e. the difference in time to just check a date and get a whole record from a database is usually negligible, but making 2 queries when one will do actually makes things worse)

I've been bitten by a number of proxy-induced caching bugs, particularly old and not well maintained proxy servers in customer networks. Test with multiple users to make sure that your caching efforts don't actually make things worse.