Java – Cache vs DB design decision

cachingdesignenterprise-architecturejavapersistence

Number of times this question comes to my and my team mind, should we persist or cache the data. I understand some time there is functional
requirement that we need to persist in DB. But in my case there is not any.

But here I mainly talking in context of scalability/performancewhich design is better. Lets take an example of Parking lot where i can keep the data completely
in distributed cache with persistence(if required) like Redis or in DBMS like oracle/MySQL . In order to survive server crash/node failure, I can keep redis replica set.

So question come here completely based on performance which is better.

Per my understanding there are two approaches to select one of them(cache vs DB) :-

  1. One way is passive approach. Implement both DB and cache approach, simulate the load and analyze the result. But I believe this is time consuming approach.
    Also you need the resources dev and hardware to do the same.

  2. My question is how theoretically we can analyze both the approaches based on some load ? Consider Every millisecond system get 10 parking request.

My understanding cache based strategy will be more useful as we can save the IO operations . But how theoretically I can concretely calculate and come close to conclusion ? How can I calculate something like
10 K request per second will take approx this much time with persistence(with this infrastructure) and this much time on cache based approach (with this infrastructure) ? This may not be 100% correct but close to that.

Here is the Parking lot brief design for reference

public class ParkingLot 
{
    Vector<ParkingSpace> vacantParkingSpaces = null;
    Vector<ParkingSpace> fullParkingSpaces = null;

    int parkingSpaceCount = 0;

    boolean isFull;
    boolean isEmpty;

.................
}

public class ParkingSpace 
{
    boolean isVacant;
    Vehicle vehicle;
    ParkingType parkingType;
    int distance;
}

public class Vehicle 
{
    int num;
}

public enum ParkingType
{
    REGULAR,
    HANDICAPPED,
    COMPACT,
    MAX_PARKING_TYPE,
}

Best Answer

You may consider the following:

  • While Redis can persist data, this was not its original goal. It was originally designed as a cache solution, not a database. This is especially important if you expect to stay flexible and be able, eventually, to switch to another cache solution such as memcached.

  • Redis is extremely fast when data is not persisted. If persistence is enabled, I would expect Redis to have its performance comparable to other databases such as Apache Cassandra.

  • Depending on the structure of your data, a key-value store may be an optimal solution, or may not. Maybe it would make sense to store it as a document, in which case using a database such as MongoDB would make sense. Or maybe you need a strong relational model, and therefore a RDBMS would be a good choice.

    Note that it's not about being able to represent your data using this or that structure: in most cases (with some very narrow exceptions), the same data could be represented as a document, or as a bunch of key-values or using a relational model. It's more about the best suited way to store the data.

Let's suppose that both a key-value store and a RDBMS seem to be a good fit. Which one would be faster? (Similarly, imagine you have to chose between multiple RDBMS or multiple document databases, all of which match your needs; which one would give you higher performance?)

In order to have a reliable result, you have little other choice than to implement a basic test scenario using both solutions, and plainly compare them running them on the same hardware. This is exactly what you suggested in your first point. You don't need to code the whole data access layer of your application twice; just take a bunch of operations which are supposedly representative of the application, do a test, and run it repeatedly for some time.

As for the theoretical computations, just don't. If you're a full-time database administrator and you have an expertise in both databases you're considering using for your application, you may possibly find something useful (however, with ten years experience in both databases, you would probably already know which one of the two you should use anyway). If not, your computations won't be reliable, or would simply be wrong and misleading, while taking more time that you would take to write the tests in the first place.

Related Topic