Scala – Why is Scala hashmap slow

hashmapjava-8scalascala-2.11

And what can be done about it?

I have run some tests and it seems that Scala Hashmap is much slower than a Java HashMap. Please prove me wrong!

For me the whole point of Hashmap is to get quick access to a value from a given key. So I find myself resorting to using a Java HashMap when speed matters, which is a bit sad. I'm not experienced enough to say for sure but it seems that the more you mix Java and Scala the more problems you are likely to face.

test("that scala hashmap is slower than java") {
    val javaMap = new util.HashMap[Int,Int](){
      for (i <- 1 to 20)
      put(i,i+1)
    }

    import collection.JavaConverters._
    val scalaMap = javaMap.asScala.toMap

    // check is a scala hashmap
    assert(scalaMap.getClass.getSuperclass === classOf[scala.collection.immutable.HashMap[Int,Int]])

    def slow = {
      val start = System.nanoTime()
      for (i <- 1 to 1000) {
        for (i <- 1 to 20) {
          scalaMap(i)
        }
      }
      System.nanoTime() - start
    }

    def fast = {
      val start = System.nanoTime()
      for (i <- 1 to 1000) {
        for (i <- 1 to 20) {
          javaMap.get(i)
        }
      }
      System.nanoTime() - start
    }

    val elapses: IndexedSeq[(Long, Long)] = {
      (1 to 1000).map({_ => (slow,fast)})
    }

    var elapsedSlow = 0L
    var elapsedFast = 0L
    for ((eSlow,eFast) <- elapses) {
      elapsedSlow += eSlow
      elapsedFast += eFast
    }

    assert(elapsedSlow > elapsedFast)

    val fraction : Double = elapsedFast.toDouble/elapsedSlow
    println(s"slower by factor of: $fraction")
}

Am I missing something?

Answer Summary

As of now, when comparing Java 8 to Scala 2.11, it appears that Java HashMap is notably speedier at lookups (for a low number of keys) than the Scala offerings – with the exception of LongMap (if your keys are Ints/Longs).

The performance difference is not so great that it should matter in most use cases. Hopefully Scala will improve the speed of their Maps. In the mean time, if you need performance (with non-integer keys) use Java.

Int keys, n=20
Long(60), Java(93), Open(170), MutableSc(243), ImmutableSc(317)

case object keys, n=20
Java(195), AnyRef(230)

Best Answer

First of all: doing JVM benchmarks using nanoTime is extremely error-prone. Use a microbenchmarking framework such as Thyme, Caliper or JMH

Second: you are comparing a mutable java hash map with an immutable scala hash map. Immutable collections can be remarkably fast, but there are some cases where they will never be as fast as mutable data structures.

Here is a proper microbenchmark of mutable java hash map vs. immutable scala hash map: https://gist.github.com/rklaehn/26c277b2b5666ec4b372

As you can see, the scala immutable map is a bit faster than the java mutable map. Note that this will not be the case once you go to larger maps, because an immutable data structure has to do some compromises to enable structural sharing. I would guess that in both cases, the dominant performance issue is boxing of the ints to Integer.

Update: if you really want a mutable hash hap with ints as keys, the right choice from the scala collections library is scala.collection.mutable.LongMap. This uses a long as key, and has much better performance than the generic Map because it does not have to box the value. See results from the gist.

Update 2: If your key extends from AnyRef (like e.g. a String), your best bet for a high performance mutable map is scala.collection.mutable.AnyRefMap