How to benchmark concurrent key-value stores

concurrencyhashingparallel programmingperformance

I have some concurrent key-value store implementations that are implemented with hash tables and search trees that I would like to compare. I would like to benchmark them with a real world application where several threads stresses the key-value stores.

I already have some micro-benchmark that stresses the key-value stores by executing random operations on the key-value stores in parallel. What I'm interesting in now is applications that do some work that could be useful in the "real world" and where one or several key-value stores are important for scalability and speed. I would like to have a benchmark that is easy to set up and that could be run on many different system. I would prefer if it does not involve network communication etc.

An example of the kind of applications I'm looking for is the parallel PageRank algorithm. It is used as a benchmark of different key-value stores in the paper "Concurrent Tries with Efficient Non-Blocking Snapshots (PPoPP'12)".

The reasons why I'm not satisfied with "artificial" benchmarks that do X% inserts, Y% deletes and Z% lookups are:

It can be more convincing with a benchmark that also solves a real world problem. The risk with an "artificial" benchmarks is that they might not correspond to any real world situation.
Some usage scenarios that happen frequently in real world applications might not be covered by the artificial benchmark.

Best Answer

Why not take your existing random KVP (key value pair) operation testing to the next level?

Presumably, your current set of tests includes a list of potential KVPs and then performing CRUD operations against whichever KVP was selected. In effect, the list of KVPs drives the benchmarks against your system. An Actor randomly selects the KVP and then picks a CRUD op.

The next logical stage is to create sets of operations which will "replace" your list of potential KVPs as the driver. The sets of operations will reflect what you think a "real" workload will be. In some cases, it will still be CRUD ops on KVPs. In other cases, as you mentioned, it will have additional changes (aka "real work") and it's the aggregate of those operations that make the set.

Now your Actor will select from the list of sets instead of KVPs. Bonus points if you make your Actor intelligent enough to pick relative workloads, so some percentage would be CRUD on KVP and some other percentage would be "real work."

This approach doesn't fully address your concerns with "artificial" benchmarks, but I don't know that any solution in the abstract can really resolve that issue. In theory, you know the expected work load best so you can tailor those sets of operations accordingly.

The benefit of this approach is you can now state "The system can handle ### transactions of X% inserts, Y% deletes, Z% lookups and Q% 'real world' operations." And you'll add a parenthetical remark explaining what "real world" means to you.

Related Solutions

Erlang vs Python – Is Erlang Worth It Compared to Python with GIL?

I have run up against the GIL in server side programming in almost every instance where I need something to scale to millions of concurrent users on multiple core machines.

Python is great for command line tools and things that don't need true concurrency to extract every last bit of performance from a given piece of hardware.

But for things that really need to squeeze everything out of something like a Sun T2000, you don't want to write anything in Python, it will be a operational maintenance nightmare running 32 separate processes and trying to management them all.

I abandoned Twisted in favor of Erlang a few years ago, Python just doesn't cut it in the large scale concurrency space. The transparent distributed nature of Erlang means it scales horizontally as well as vertically.

Scala vs Java – Performance Comparison

There's one thing that you can do concisely and efficiently in Java that you can't in Scala: enumerations. For everything else, even for constructs that are slow in Scala's library, you can get efficient versions working in Scala.

So, for the most part, you don't need to add Java to your code. Even for code that uses enumeration in Java, there's often a solution in Scala that is adequate or good -- I place the exception on enumerations that have extra methods and whose int constant values are used.

As for what to watch out for, here are some things.

If you use the enrich my library pattern, always convert to a class. For example:

// WRONG -- the implementation uses reflection when calling "isWord"
implicit def toIsWord(s: String) = new { def isWord = s matches "[A-Za-z]+" }

// RIGHT
class IsWord(s: String) { def isWord = s matches "[A-Za-z]+" }
implicit def toIsWord(s: String): IsWord = new IsWord(s)

Be wary of collection methods -- because they are polymorphic for the most part, JVM does not optimize them. You need not avoid them, but pay attention to it on critical sections. Be aware that for in Scala is implemented through method calls and anonymous classes.
If using a Java class, such as String, Array or AnyVal classes that correspond to Java primitives, prefer the methods provided by Java when alternatives exist. For example, use length on String and Array instead of size.
Avoid careless use of implicit conversions, as you can find yourself using conversions by mistake instead of by design.
Extend classes instead of traits. For example, if you are extending Function1, extend AbstractFunction1 instead.
Use -optimise and specialization to get most of Scala.
Understand what is happening: javap is your friend, and so are a bunch of Scala flags that show what's going on.
Scala idioms are designed to improve correctness and make the code more concise and maintainable. They are not designed for speed, so if you need to use null instead of Option in a critical path, do so! There's a reason why Scala is multi-paradigm.
Remember that the true measure of performance is running code. See this question for an example of what may happen if you ignore that rule.

Best Answer

Related Solutions

Erlang vs Python – Is Erlang Worth It Compared to Python with GIL?

Scala vs Java – Performance Comparison

Related Topic