Data Structures – How Immutable Sets Are Manipulated

data structuresimmutabilityperformance

When working with an immutable set or map, like the ones found in many functional programming languages, operations that would otherwise modify the container generate a new container instead.

I know that most list operations in functional languages do not result in a copy of the list and just rearrange pointers. This is why working with lists is extremely efficient.

I am curious whether immutable maps are similar or if an entirely new map is created after each operation. I am asking because a library I wrote manipulates a lot of maps and I am curious whether I will see a performance boost if I switch over to immutable maps. Currently, I am just looping over my map adding or throwing away key/value pairs as I go.

Best Answer

As far as I can tell, functional sets are generally implemented as trees, so sharing nodes between concurrent versions makes sense. Some functional languages, notably F# and Clojure, open source their code on github, you can look there for concrete details. F# uses trees.

Some time ago, I have been comparing performance of F# immutable Set (Microsoft.FSharp.Collections) vs mutable .NET HashSet (System.Collections.Generic). I do not have results available to share now, but as far as I remember lookup / union / intersection times were similar for both and when adding large numbers of entries the immutable set performed slower by some low constant factor (something around 3 or 4).

Apart from the book, there's also Okasaki's thesis available, which was the foundation for the book - glimpsed through it and it looks rather hard (well, like a proper thesis should), though you might find it useful.