How to benchmark concurrent key-value stores

concurrencyhashingparallel programmingperformance

I have some concurrent key-value store implementations that are implemented with hash tables and search trees that I would like to compare. I would like to benchmark them with a real world application where several threads stresses the key-value stores.

I already have some micro-benchmark that stresses the key-value stores by executing random operations on the key-value stores in parallel. What I'm interesting in now is applications that do some work that could be useful in the "real world" and where one or several key-value stores are important for scalability and speed. I would like to have a benchmark that is easy to set up and that could be run on many different system. I would prefer if it does not involve network communication etc.

An example of the kind of applications I'm looking for is the parallel PageRank algorithm. It is used as a benchmark of different key-value stores in the paper "Concurrent Tries with Efficient Non-Blocking Snapshots (PPoPP'12)".

The reasons why I'm not satisfied with "artificial" benchmarks that do X% inserts, Y% deletes and Z% lookups are:

  1. It can be more convincing with a benchmark that also solves a real world problem. The risk with an "artificial" benchmarks is that they might not correspond to any real world situation.
  2. Some usage scenarios that happen frequently in real world applications might not be covered by the artificial benchmark.

Best Answer

Why not take your existing random KVP (key value pair) operation testing to the next level?

Presumably, your current set of tests includes a list of potential KVPs and then performing CRUD operations against whichever KVP was selected. In effect, the list of KVPs drives the benchmarks against your system. An Actor randomly selects the KVP and then picks a CRUD op.

The next logical stage is to create sets of operations which will "replace" your list of potential KVPs as the driver. The sets of operations will reflect what you think a "real" workload will be. In some cases, it will still be CRUD ops on KVPs. In other cases, as you mentioned, it will have additional changes (aka "real work") and it's the aggregate of those operations that make the set.

Now your Actor will select from the list of sets instead of KVPs. Bonus points if you make your Actor intelligent enough to pick relative workloads, so some percentage would be CRUD on KVP and some other percentage would be "real work."

This approach doesn't fully address your concerns with "artificial" benchmarks, but I don't know that any solution in the abstract can really resolve that issue. In theory, you know the expected work load best so you can tailor those sets of operations accordingly.

The benefit of this approach is you can now state "The system can handle ### transactions of X% inserts, Y% deletes, Z% lookups and Q% 'real world' operations." And you'll add a parenthetical remark explaining what "real world" means to you.

Related Topic