Uses of Persistent Data Structures in Non-Functional Languages

concurrencydata structuresfunctional programmingmultithreading

Languages that are purely functional or near-purely functional benefit from persistent data structures because they are immutable and fit well with the stateless style of functional programming.

But from time to time we see libraries of persistent data structures for (state-based, OOP) languages like Java. A claim often heard in favor of persistent data structures is that because they are immutable, they are thread-safe.

However, the reason that persistent data structures are thread-safe is that if one thread were to "add" an element to a persistent collection, the operation returns a new collection like the original but with the element added. Other threads therefore see the original collection. The two collections share a lot of internal state, of course — that's why these persistent structures are efficient.

But since different threads see different states of data, it would seem that persistent data structures are not in themselves sufficient to handle scenarios where one thread makes a change that is visible to other threads. For this, it seems we must use devices such as atoms, references, software transactional memory, or even classic locks and synchronization mechanisms.

Why then, is the immutability of PDSs touted as something beneficial for "thread safety"? Are there any real examples where PDSs help in synchronization, or solving concurrency problems? Or are PDSs simply a way to provide a stateless interface to an object in support of a functional programming style?

Best Answer

Persistent/immutable data structures don't solve concurrency problems on their own, but they make solving them much easier.

Consider a thread T1 that passes a set S to another thread T2. If S is mutable, T1 has a problem: It loses control of what happens with S. Thread T2 can modify it, so T1 can't rely at all on content of S. And vice versa - T2 can't be sure that T1 doesn't modify S while T2 operates on it.

One solution is to add some kind of a contract to the communication of T1 and T2 so that only one of the threads is allowed to modify S. This is error prone and burdens both the design and implementation.

Another solution is that T1 or T2 clone the data structure (or both of them, if they aren't coordinated). However, if S isn't persistent, this is an expensive O(n) operation.

If you have a persistent data structure, you're free of this burden. You can pass a structure to another thread and you don't have to care what it does with it. Both threads have access to the original version and can do arbitrary operations on it - it doesn't influence what the other thread sees.

See also: persistent vs immutable data structure.