Concurrency Caching – Ensuring Latest Version in Cache

cachingconcurrency

We have a data service app working on object graphs. We place some complex graphs in a (memory) caching tier as a single data item, so as to avoid the length of time to retrieve every individual data item in disk storage in order to build the entire hierarchy.

That means whenever any of the data items change, a re-cache event is triggered – reloading everything in the hierarchy and storing the newer version to cache.

Recently we made tweaks to the service to increase its concurrency of asynchronous operations (mostly network I/O), and observed contention behaviour if multiple incoming requests made updates to items in essentially the same object graph.

Each update operation is unaware of other concurrent updates. At the storage persistence level, the infrastructure has timestamps on the data rows; allowing us to apply optimistic concurrency techniques, ensuring nobody is writing an outdated version back to storage.

A change in any child object causes the root object to increment a version number, thus the re-cache. During such an event, other concurrent updates may happen for any other child object (or even the same), which makes it difficult to tell what a particular version of the root object is supposed to really have (at point of update time). What is worse, and the main concern here, is that due to concurrent thread contention for processor attention, the caching attempt of an older version can happen after the caching attempt of a newer version.

Right now we have minimised this behaviour by verifying that the root object's version during a re-cache event still matches the version saved during its own update operation. If the version number is not the same, it means another concurrent update has since changed it further and the re-cache should not proceed – let the last update operation perform the re-cache.

That of course minimises the chances but not eliminate outright; it can still be possible for the root object to retain its original version number on re-fetch, and while re-fetching all the children objects, another update happens and by stroke of (ill) luck manages to complete and re-cache before this ongoing re-caching attempt has completed its round.

To further complicate the issue, this data service is deployed as a cluster of servers with no knowledge about each other's siblings.

Was wondering if there is any effective way to manage the concurrent updates across multiple threads and servers to ensure the cached copy is indeed always the latest copy?

Best Answer

At the end it seems necessary that the cache service also needs some optimistic concurrency implementation.

Instead of blindly writing whatever "new" version the application has to the cache tier, it must first retrieve a copy (if cached) to determine the cached version and current optimistic concurrency key.

E.g. Couchbase Server provides CAS values; Azure blob storage provides ETag.

If the current cached version is newer than the attempted version, the attempt will abort since it has been superceded.

If the current cached version is older, the cache attempt will proceed with the last-known optimistic concurrency key value. If another separate thread/operation had re-cached in between, then the optimistic concurrency key would have changed and the current cache attempt would fail. Under such an event, it would have to re-fetch the new item version and optimistic concurrency key, and repeat the cycle.

This of course comes at the cost of having to waste at least one read operation simply to determine whether it is safe to write a new version. But given that data consistency is paramount I guess that is a price that has to be paid.

Related Topic