Possibility of stale data in cache-aside pattern

cachingconcurrencydesign-patternsmemcached

Just to re-cape cache-aside pattern it defines following steps when fetching and updating data.

Fetching Item

  1. Return the item from cache if found in it.
  2. If not found in cache, read from data store.
  3. Put the read item in cache and return it.

Updating Item

  1. Write the item in data store.
  2. Remove the corresponding entry from cache.

This works perfectly in almost all cases, but it seems to fail in one theoretical scenario.

What if step 1 & 2 of updating item, happen between step 2 & 3 of fetching item. In other words, consider that initially data store had the value 'A' and it was not in cache. So when fetching item, we read 'A' from data store but before we put into the cache, the item was updated to 'B' in another thread (So 'B' was written in data store and tried to remove the entry from cache, which was not there at that time). Now when the fetching thread puts the item it read (i.e. 'A') in cache. So now 'A' will stay cached, and further fetches will return stale data, until item expires or updated again.

So am I missing something here, is my understanding of pattern is wrong. Or that the scenario is just practically impossible, and there is no need to worry about it.

Also I would like to know if some changes can be made in the pattern to avoid this problem.

Best Answer

You are correctly pointing out a race condition.

Cache-aside, as described here and here, is an imperfect abstraction that is not appropriate for all data storage use cases.

Consistency. Implementing the Cache-Aside pattern does not guarantee consistency between the data store and the cache. An item in the data store may be changed at any time by an external process, and this change might not be reflected in the cache until the next time the item is loaded into the cache. In a system that replicates data across data stores, this problem may become especially acute if synchronization occurs very frequently.

This text refers to the problems when someone modifies the data store without informing the cache its need to invalidate. (The pattern describes the need for an appropriate eviction policy, which is intended to limit the lifetime of data errors.)

Yet, the race condition you are pointing out could occur even if all clients are playing by the rules. When the race condition happens, stale data will be deposited in the cache and remain there until it is otherwise evicted (by standard (e.g. time-based) eviction or because data at that key is one again updated, and this time perhaps without the race.)

Serving some stale data alongside up-to-date data is a worse kind of consistency breach than merely returning stale information that was at least fully correct taken together at one point in time (a snapshot). This is sometimes called a form of read or write skew.

Also I would like to know if some changes can be made in the pattern to avoid this problem.

A problem with this pattern is that it spreads a single responsibility (data storage; holding state) across multiple components. So, a way to fix the race condition you're pointing out is to change the model so that the cache is a first-class entity having the full responsibility for both reading and writing the data. The clients would ask the cache for data, the cache, when needed, would fetch the data from the data store and return it to the clients. The clients would inform the cache of updates, and the cache would update the data store. The cache would then be in a position to provide the appropriate synchronization so it could avoid race conditions.


Ultimately, this pattern is useful for data that doesn't change often, and for data storage that doesn't depend on set consistency / transactions over multiple keys. For example, it might work for certain kinds of document stores, such as a music library, especially if the library is mostly appended to, keys are never updated, and occasionally documents are deleted but it is ok (from the business perspective) to continue to serve them for the eviction duration after they are deleted.

Related Topic