Database – What’s the best way to cache a growing database table for html generation

cachingdatabaseweb-development

I've got a database table which will grow in size by about 5000 rows a hour. For a key that I would be querying by, the query will grow in size by about 1 row every hour. I would like a web page to show the latest rows for a key, 50 at a time (this is configurable). I would like to try and implement memcache to keep database activity low for reads.

If I run a query and create a cache result for each page of 50 results, that would work until a new entry is added. At that time, the page of latest results gets new result and the oldest results drops off. This cascades down the list of cached pages causing me to update every cache result. It seems like a poor design.

I could build the cache pages backwards, then for each page requested I should get the latest 2 pages and truncate to the proper length of 50. I'm not sure if this is good or bad?

Ideally, the mechanism I use to insert a new row would also know how to invalidate the proper cache results.

Has someone already solved this problem in a widely acceptable way? What's the best method of doing this?

EDIT:

If my understanding of the MYSQL query cache is correct, it has table level granularity in invalidation. Given the fact that I have about 5000 updates before a query on a key should need to be invalidated, it seems that the database query cache would not be used. MS SQL caches execution plans and frequently accessed data pages, so it may do better in this scenario.

My query is not against a single table with TOP N. One version has joins to several tables and another has sub-selects.

Also, since I want to cache the html generated table, I'm wondering if a cache at the web server level would be appropriate? Is there really no benefit to any type of caching? Is the best advice really to just allow a website site query to go through all the layers and hit the database every request?

Best Answer

Unless I'm misunderstanding the question, I don't think that this is an appropriate scenario for caching.

Cached data normally has at least one of the following attributes (usually all of them):

  • Expensive to retrieve or compute;
  • Highly static - may change occasionally but very rarely;
  • Non-critical - OK if the requester sees stale data.

It doesn't sound like any of these apply to your situation.

  • The query is a simple SELECT, probably TOP N, just an index seek;
  • It changes very frequently;
  • Your requirements indicate that immediate updates are required.

So why are you caching? Caching isn't a panacea; oftentimes it can actually make performance worse, if the cache memory could be better used for some other purpose.

Databases do their own caching. As long as the DB server has plenty of memory then it may cache the entire table in memory if it's frequently queried; the performance of that will be just as good as your cache if not better.

Some further thoughts/suggestions:

  • If stale data is OK, then the simplest solution would be to use a fixed interval (i.e. expiration). This method is used very effectively in hundreds of thousands of sites and systems. You can either force an update on expiration or just wait until it's requested again.

  • If you're concerned about conflicts between reads and writes, then (a) don't be, until you've profiled it, and (b) if it really is an issue then instead of trying to cache it, just use a redundant table or a NOLOCK hint.

If you need to invalidate the cache every single time a row is added/changed then you have completely defeated the purpose of an application cache, and are now trying to implement an in-memory database. Please don't do this unless you have an extremely good reason for it.

Related Topic