Object Identity and Mutability in Java and C++

cimmutabilityjavamemoryobject-oriented

I was reading a proposal for value types in Java, and I came across this sentence: "Object identity serves only to support mutability, where an object’s state can be mutated but remains the same intrinsic object."

From what I understand (albeit tentatively), object identity is the idea of your variable acting as a pointer or reference to an object located elsewhere in memory (such as objects instantiated on the heap in Java or C#). So what would this have to do with object mutability? Does this imply that, for example, instantiated objects on the stack in C++ are immutable? I'm having trouble seeing the link here.

Best Answer

Before tackling identity, let's define what we mean by equality a little more precisely. We say two things are equal if and only if we can't tell them apart (see: Identity of indiscernibles). That means that whether two things are equal or not depends on the means we have to inspect them.

Let's think about that some more in programming terms. Let's leave our preconceptions at the door and suppose we're working in a brand-new unknown language where all variables and values are immutable. By the definition above, two values A and B are equal if and only if there are NO programs in the language that yield different results when A is used in place of B or vice-versa. Let's say A and B are (IEEE 754) floats, and when substituted into the expression _ + 1.0, the result is 1.0 for both A and B. Surely A and B are both zero. Are they equal? That depends - does the language provide any function that allows me to determine the sign of the zero? If it doesn't, they're equal; if it does, they may not be.

So two values are equal any time they give the same results for all possible combinations of operations they support. Immutable values in particular don't produce different results depending on which operations were previously applied to them. For that reason, we don't care if two variables point to two copies of the same value or if they both point to the same copy.

What does this have to do with mutability? Mutability implies our language has some notion of a memory cell whose contents can be overwritten. Let's say we add support for mutable memory cells to our language:

  • ref <value> creates a new memory cell, distinct from all others, initialized to <value>.
  • <variable> := <value> overwrites the contents of a reference cell.
  • !<variable> returns the value currently stored in a reference cell.

Now let's think about what equality means for memory cells. Suppose A = ref 0 and B = A. Consider this program:

A := 1
print(!_)

Substituting the blank for A prints 1, and substituting for B prints 1 as well. Now suppose A = ref 0 and B = ref 0. In this case, substituting into the above program prints 1 and 0, since now A and B point to distinct memory cells.

So it does matter to us whether two references point to the same memory cell or different memory cells. Since that matters, it'd be useful to have an efficient and general way of telling two references apart. Our current method of comparing the values they hold, and if they're equal mutating one of them is troublesome for a number of reasons:

  • It depends on being able to compare the values stored in the memory cells for equality. Equality doesn't make sense for all types - for example, it's generally meaningless for functions, because there's no general method to determine if two unknown functions are equal (this is venturing into Halting Problem territory). So given two references to memory cells storing functions, we can't compare the functions they hold for equality.
  • It depends on having some value that we can assign to one of the two references. So even if equality made sense for all types in the language, we still need access to a value for each type we want to compare. What if constructing a value of that type has side effects?
  • The reference value we use to mutate one of the references must be different from the value the memory cell already has, so we actually need two values.
  • The code to compare references of different types will look exactly the same save for the two values we use.
  • We need to back up and restore the value of the reference we mutate to avoid changing the meaning of the program.

So it'd be useful for the language to provide an operation to directly check if two references point to the same mutable memory cell. Such a function is pointless for immutable values; in fact, I'd say it's downright harmful. If there existed a way to tell if two 1s are stored in different places in memory, then there can be programs that care whether I pass one 1 or the other. I really don't want to worry about whether I have "the right 1"; math is hard enough as it is! So it's clear that being able to check for memory equality is mainly useful for mutable types.

Related Topic