Uses of Persistent Data Structures in Non-Functional Languages

concurrencydata structuresfunctional programmingmultithreading

Languages that are purely functional or near-purely functional benefit from persistent data structures because they are immutable and fit well with the stateless style of functional programming.

But from time to time we see libraries of persistent data structures for (state-based, OOP) languages like Java. A claim often heard in favor of persistent data structures is that because they are immutable, they are thread-safe.

However, the reason that persistent data structures are thread-safe is that if one thread were to "add" an element to a persistent collection, the operation returns a new collection like the original but with the element added. Other threads therefore see the original collection. The two collections share a lot of internal state, of course — that's why these persistent structures are efficient.

But since different threads see different states of data, it would seem that persistent data structures are not in themselves sufficient to handle scenarios where one thread makes a change that is visible to other threads. For this, it seems we must use devices such as atoms, references, software transactional memory, or even classic locks and synchronization mechanisms.

Why then, is the immutability of PDSs touted as something beneficial for "thread safety"? Are there any real examples where PDSs help in synchronization, or solving concurrency problems? Or are PDSs simply a way to provide a stateless interface to an object in support of a functional programming style?

Best Answer

Persistent/immutable data structures don't solve concurrency problems on their own, but they make solving them much easier.

Consider a thread T1 that passes a set S to another thread T2. If S is mutable, T1 has a problem: It loses control of what happens with S. Thread T2 can modify it, so T1 can't rely at all on content of S. And vice versa - T2 can't be sure that T1 doesn't modify S while T2 operates on it.

One solution is to add some kind of a contract to the communication of T1 and T2 so that only one of the threads is allowed to modify S. This is error prone and burdens both the design and implementation.

Another solution is that T1 or T2 clone the data structure (or both of them, if they aren't coordinated). However, if S isn't persistent, this is an expensive O(n) operation.

If you have a persistent data structure, you're free of this burden. You can pass a structure to another thread and you don't have to care what it does with it. Both threads have access to the original version and can do arbitrary operations on it - it doesn't influence what the other thread sees.

Related Solutions

What are the consequences of immutable classes with references to mutable classes

A Class isn't truly immutable if any of this children references aren't immutable as well. If your root Class has final references to all its instance variables and all those instance variables are immutable as well as all their children following this same restriction then you can say the root Class is immutable.

If any of the children references are non-final or any of the children instances are mutable then the containing Class is not immutable.

It doesn't matter about the internal immutablity whatever that means is anyone's guess, all that matters is that the public contract to the class immutability is kept. You can't have partial immutability anymore than you can be partially dead. Your class is either immutable or it isn't. And it isn't if any of its referred to classes or their classes, etc. are mutable.

If you have a graph and any of the members of the graph are mutable you can't deterministically say that you have an immutable state. Concurrency guarantees go out the window, .equals() and .hashCode() become non-deterministic and simple to test cloning and serializing go out the window as well.

If anything breaks this immutability contract you lose all the benefits of trying to maintain this immutability contract.

Simpler concurrency issues is a main motivational force of immutability.

Having side effect free code helps with predictability and maintainability. Since you don't have to wonder where things are getting mutated in the call tree because you know they can't be mutated.

Performance is another less important factor, in Java the JVM can make some highly optimized decisions about caching and other factors if it knows data can't change state. It does provide important hints to the compiler at compile time and the JIT algorithms at runtime.

Functional Programming – Data Structures in Functional Programming

It's been a while since I've worked in LISP, but as I recall, the basic non-atomic structure is a list. Everything else is based on that. So you could have a list of atoms where each atom is a node followed by a list of edges that connect the node to other nodes. I'm sure there's other ways to do it too.

Maybe something like this:

(
  (a (b c)),
  (b (a c)),
  (c (a b d)),
  (d (c))
)

could give a graph like this:

a<-->b<-->c<-->d
^         ^
|         |
+---------+

If you want to get fancy, you could add weights to it as well:

(
  (a (b 1.0 c 2.0)),
  (b (a 1.0 c 1.0)),
  (c (a 1.3 b 7.2 d 10.5)),
  (d (c -10.5))
)

You might also be interested in this: CL-Graph (found by google-searching the phrase "lisp graph structure" )

Best Answer

Related Solutions

What are the consequences of immutable classes with references to mutable classes

Functional Programming – Data Structures in Functional Programming

Related Topic