Shared Cache – Invalidation Best Practice

azurecachingmemcachedperformance

I'd like to know what would be a better approach to invalidate/update cache objects.

Prerequisites

  • Having remote memcached server (serving as cache for multiple applications)
  • All servers are hosted by azure (affinity regions, same data centers)
  • Cache object size ranges from 200 bytes up to a 50 kilobytes

Approach 1 (store in cache asap)

  1. Object A is created -> store in database and store in cache
  2. Object A requested by client -> check cache for existence, otherwise fetch from database and store in cache
  3. Object A gets updated -> store in database, store in cache

Approach 1 seems to be more straightforward. If something is created, put in the cache asap. Regardless of someone will need it.

Approach 2 (lazy cache store)

  1. Object A is created -> store in database
  2. Object A requested by client -> check cache for existence, otherwise fetch from database and store in cache
  3. Object A gets updated -> store in database, delete key in cache

Approach 2 seems to be more memory-aware. In this approach only requested items go to cache.

Question 1: In mind of performance, what would be a better approach? Memory nor CPU do not count yet.

Question 2: Are my thoughts a kind of premature optimization?

Question 3: Any other thoughts? Other approaches?

Best Answer

  1. Is unanswerable, except to say it depends. There are a lot of factors which will determine which approach is going to be the best in your case, e.g.: Is it normal for created objects to be retrieved shortly after they are created? What's the ratio of updates to accesses?
  2. Re. deciding you need a cache: If you're optimising without data then yes, it's technically premature optimisation. I say technically since experience/conventional wisdom may tell you you're going to need a cache of some sort. Re. deciding how the cache will best work: yes, it's definitely premature optimisation.
    • Optimisation often isn't about finding the best/most optimal solution. It should go as follows:
      1. Find the bottlenecks in the system.
      2. Find where you can make the biggest difference with the least amount of work.
      3. Do the least amount of work!
      4. Is it fast enough yet? If not, go to #1.
      5. Done!
    • Honestly, neither of the approaches you describe sound complicated. Why not implement both and see which works best?
    • Step 3 in approach #2 could be changed to "Object A gets updated -> store in database, update entry in cache".
Related Topic