Java Immutability – Practical Ways for Immutable Linked Node Structures

immutabilityjavalinked list

I decided to write a singly-linked list, and had the plan going in to make the internal linked node structure immutable.

I ran into a snag though. Say I have the following linked nodes (from previous add operations):

1 -> 2 -> 3 -> 4

and say I want to append a 5.

To do this, since node 4 is immutable, I need to create a new copy of 4, but replace its next field with a new node containing a 5. The problem is now, 3 is referencing the old 4; the one without the appended 5. Now I need to copy 3, and replace its next field to reference the 4 copy, but now 2 is referencing the old 3…

Or in other words, to do an append, the entire list seems to need to be copied.

My questions:

Is my thinking correct? Is there any way to do an append without copying the entire structure?
Apparently "Effective Java" contains the reccomendation:

Classes should be immutable unless there's a very good reason to make them mutable…

Is this a good case for mutability?

I don't think this is a duplicate of the suggested answer since I'm not talking about the list itself; that obviously has to be mutable to conform to the interface (without doing something like keeping the new list internally and retrieving it via a getter. On second thought though, even that would require some mutation; it would just be kept to a minimum). I'm talking about whether or not the internals of the list must be immutable.

Best Answer

With lists in functional languages, you nearly always work with a head and tail, the first element and the remainder of the list. Prepending is much more common because, as you surmised, appending requires copying the entire list (or other lazy data structures that don't precisely resemble a linked list).

In imperative languages, appending is much more common, because it tends to feel more natural semantically, and you don't care about invalidating references to previous versions of the list.

As an example of why prepending doesn't require copying the entire list, consider you have:

2 -> 3 -> 4

Prepending a 1 gives you:

1 -> 2 -> 3 -> 4

But note that it doesn't matter if someone else is still holding a reference to 2 as the head of their list, because the list is immutable and the links only go one way. There's no way to tell the 1 is even there if you only have a reference to 2. Now, if you appended a 5 onto either list, you'd have to make a copy of the entire list, because otherwise it would appear on the other list as well.

Related Solutions

Design Patterns – Immutable Structures and Deep Composition Hierarchy

I could have stored indices of Polygon in current Scene, index of dragged point in Polygon, and replace it every time. But this approach does not scale - when composition levels go to 5 and further, boilerplate would become unbearable.

You're absolutely right, this approach doesn't scale if you can't get around the boilerplate. Specifically, the boilerplate for creating a whole new Scene with a tiny subpart changed. However, many functional languages provide a construct for dealing with this sort of nested structure manipulation: lenses.

A lens is basically a getter and setter for immutable data. A lens has a focus on some small part of a larger structure. Given a lens, there are two things you can do with it - you can view the small part of a value of the larger structure, or you can set the small part of a value of a larger structure to a new value. For example, suppose you have a lens that focuses on the third item in a list:

thirdItemLens :: Lens [a] a

That type means the larger structure is a list of things, and the small subpart is one of those things. Given this lens, you can view and set the third item in the list:

> view thirdItemLens [1, 2, 3, 4, 5]
3
> set thirdItemLens 100 [1, 2, 3, 4, 5]
[1, 2, 100, 4, 5]

The reason lenses are useful is because they are values representing getters and setters, and you can abstract over them the same way you can other values. You can make functions that return lenses, for instance a listItemLens function which takes a number n and returns a lens viewing the nth item in a list. Additionally, lenses can be composed:

> firstLens = listItemLens 0
> thirdLens = listItemLens 2
> firstOfThirdLens = lensCompose firstLens thirdLens
> view firstOfThirdLens [[1, 2], [3, 4], [5, 6], [7, 8]]
5
> set firstOfThirdLens 100 [[1, 2], [3, 4], [5, 6], [7, 8]]
[[1, 2], [3, 4], [100, 6], [7, 8]]

Each lens encapsulates behavior for traversing one level of the data structure. By combining them, you can eliminate the boilerplate for traversing multiple levels of complex structures. For instance, supposing you have a scenePolygonLens i that views the ith Polygon in a Scene, and a polygonPointLens n that views the nth Point in a Polygon, you can make a lens constructor for focusing on just the specific point you care about in an entire scene like so:

scenePointLens i n = lensCompose (polygonPointLens n) (scenePolygonLens i)

Now suppose a user clicks point 3 of polygon 14 and moves it 10 pixels right. You can update your scene like so:

lens = scenePointLens 14 3
point = view lens currentScene
newPoint = movePoint 10 0 point
newScene = set lens newPoint currentScene

This nicely contains all the boilerplate for traversing and updating a Scene inside lens, all you have to care about is what you want to change the point to. You can further abstract this with a lensTransform function that accepts a lens, a target, and a function for updating the view of the target through the lens:

lensTransform lens transformFunc target =
  current = view lens target
  new = transformFunc current
  set lens new target

This takes a function and turns it into an "updater" on a complicated data structure, applying the function to only the view and using it to construct a new view. So going back to the scenario of moving the 3rd point of the 14th polygon to the right 10 pixels, that can be expressed in terms of lensTransform like so:

lens = scenePointLens 14 3
moveRightTen point = movePoint 10 0 point
newScene = lensTransform lens moveRightTen currentScene

And that's all you need to update the whole scene. This is a very powerful idea and works very well when you have some nice functions for constructing lenses viewing the pieces of your data you care about.

However this is all pretty out-there stuff currently, even in the functional programming community. It's difficult to find good library support for working with lenses, and even more difficult to explain how they work and what the benefits are to your coworkers. Take this approach with a grain of salt.

Internal Mutation of Persistent Data Structures in Functional Programming

I would be very loath to call a data structure "immutable" unless, once it was exposed to the outside world, unless all changes that are made to its internal state continue at all times to leave the object in the same observable state, and unless the state of the object would be valid with any arbitrary combination of those changes either happening or not happening.

An example of a reasonably-good "immutable" object which obeys this principle is the Java string type. It includes a hash-code field which is initially zero, but which is used to memoize the result of querying the hash code. The state of a string with a zero hash-code field is semantically the same as that of a string where the hash-code field is filled in. If two threads simultaneously attempt to query the hash code, it's possible that both may end up performing the computation and storing the result, but it doesn't matter because neither store will affect the observable state of the object. If a third thread comes along and queries the hash code, it might or might not see the stores from the first two, but the returned hash code will be the same regardless.

(BTW, my one quibble with Java's string-hashing method is that it's possible for the hash function to return zero for a non-null string. I would have thought it better to have the hash function test for zero and substitute something else. For example, if the hashing step is such that adding a single character to a string whose hash is zero will always yield a non-zero hash, simply return the hash of the string minus the last character. Otherwise the worst-case time to hash a long string thousands of times may be much worse than the normal-case time.)

The big things to beware of are (1) sequences of operations which change the state of an object and then change it back, or (2) substitution of objects that appear to have the same state, but don't. Ironically, Microsoft's fixing of what it regarded as bug in its default Struct.Equals method makes #2 harder to protect against. If one has a number of immutable objects which hold references to what appear to be identical immutable objects, replacing all those references with references to one of those immutable objects should be safe. Unfortunately, the Equals overrides for system types Decimal, Double, and Float will sometimes report true even when comparing slightly-different values. It used to be that wrapping one of those types in a struct and calling Equals on that struct would test for true equivalence, but Microsoft changed things so a struct will report Equals if its members do, even if those members hold non-equivalent values of the aforementioned types.

Best Answer

Related Solutions

Design Patterns – Immutable Structures and Deep Composition Hierarchy

Internal Mutation of Persistent Data Structures in Functional Programming

Related Topic