Best Ways to Understand How to Cache Domain Objects in Java

cachingjava

I've always done this wrong, I'm sure a lot of others have too, hold a reference via a map and write through to DB etc..

I need to do this right, and I just don't know how to go about it. I know how I want my objects to be cached but not sure on how to achieve it. What complicates things is that I need to do this for a legacy system where the DB can change without notice to my application.

So in the context of a web application, let's say I have a WidgetService which has several methods:

Widget getWidget();
Collection<Widget> getAllWidgets();
Collection<Widget> getWidgetsByCategory(String categoryCode);
Collection<Widget> getWidgetsByContainer(Integer parentContainer);
Collection<Widget> getWidgetsByStatus(String status);

Given this, I could decide to cache by method signature, i.e. getWidgetsByCategory("AA") would have a single cache entry, or I could cache widgets individually, which would be difficult I believe; OR, a call to any method would then first cache ALL widgets with a call to getAllWidgets() but getAllWidgets() would produce caches that match all the keys for the other method invocations. For example, take the following untested theoretical code.

Collection<Widget> getAllWidgets() {
    Entity entity = cache.get("ALL_WIDGETS");
    Collection<Widget> res;
    if (entity == null) {
        res = loadCache();
    } else {
        res = (Collection<Widget>) entity.getValue();
    }
    return res
}

Collection<Widget> loadCache() {
    // Get widgets from underlying DB
    Collection<Widget> res = db.getAllWidgets();
    cache.put("ALL_WIDGETS", res);
    Map<String, List<Widget>> byCat = new HashMap<>();
    for (Widget w : res) {
        // cache by different types of method calls, i.e. by category
        if (!byCat.containsKey(widget.getCategory()) {
            byCat.put(widget.getCategory(), new ArrayList<Widget>);
        }
        byCat.get(widget.getCatgory(), widget);
    }
    cacheCategories(byCat);
    return res;
}

Collection<Widget> getWidgetsByCategory(String categoryCode) {
    CategoryCacheKey key = new CategoryCacheKey(categoryCode);
    Entity ent = cache.get(key);
    if (entity == null) {
        loadCache();
    }
    ent = cache.get(key);
    return ent == null ? Collections.emptyList() : (Collection<Widget>)ent.getValue();
}

NOTE: I have not worked with a cache manager, the above code illustrates cache as some object that may hold caches by key/value pairs, though it's not modelled on any specific implementation.

Using this I have the benefit of being able to cache all objects in the different ways they will be called with only single objects on the heap, whereas if I were to cache the method call invocation via say Spring It would (I believe) cache multiple copies of the objects.

I really wish to try and understand the best ways to cache domain objects before I go down the wrong path and make it harder for myself later. I have read the documentation on the Ehcache website and found various articles of interest, but nothing to give a good solid technique.

Since I'm working with an ERP system, some DB calls are very complicated, not that the DB is slow, but the business representation of the domain objects makes it very clumsy, coupled with the fact that there are actually 11 different DB's where information can be contained that this application is consolidating in a single view, this makes caching quite important.

Best Answer

To be useful a cache must provide access to data faster than it can be retrieved from the database. Given that most database calls involve a network roundtrip it makes sense to cache whenever it is known that the output (value) will not change for the same input (key) over a known dimension (time, size, etc).

Thus if the inputs are unpredictable, or undecipherable, so cannot be bound to a single key then caching may not be the best solution - you'll just end up trying to write (and maintain) a poor quality database. Invest the money in a faster network instead.

It doesn't really matter if your cache contains more than one copy of a particular object so long as the key is useful to a downstream consumer. The objective of the cache is to improve performance for different consumers of the data which may approach the underlying dataset from different standpoints (say Customer-centric or Invoice-centric).

In terms of implementation at a small scale, you might want to look at the Cache classes in the Guava library instead of EhCache. A typical example of a self-populating cache would be:

LoadingCache<Key, Graph> graphs = CacheBuilder.newBuilder()
       .maximumSize(1000)
       .expireAfterWrite(10, TimeUnit.MINUTES)
       .removalListener(MY_LISTENER)
       .build(
           new CacheLoader<Key, Graph>() {
             public Graph load(Key key) throws AnyException {
               return createExpensiveGraph(key);
             }
           });

As you can see it is very straightforward to work with and Guava provides a wide variety of cache eviction strategies (size, reference, age etc). The Guava library also provides a wealth of useful utility classes that augment those found in the JDK so your overall codebase will benefit.

Thus your approach of decorating your DAO methods with a class that combines a dedicated cache and the DAO resultset with a derived key is sound. Each call to the method causes an initial local key lookup before returning which is what you appear to be looking for. Couple this with an appropriate eviction strategy tuned for each method and you have a simple to understand and maintain solution that should scale well.

Related Topic