Java – Why Arrays Do Not Override equals()

arrayjavalanguage-design

I was working with a HashSet the other day, which has this written in the spec:

[add()] adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2))

I was using char[] in the HashSet until I realized that, based on this contract, it was no better than an ArrayList! Since it's using the non-overridden .equals(), my arrays will only be checked for reference equality, which is not particularly useful. I know that Arrays.equals() exists, but that doesn't help when one is using collections such as HashSet.

So my question is, why would Java arrays not override equals?

Best Answer

There was a design decision to make early on in Java:

Are arrays primitives? or are they Objects?

The answer is, neither really... or both if you look at it another way. They work fairly closely with the system itself and the backend of the jvm.

One example of this is the java.lang.System.arraycopy() method which needs to take an array of any type. Thus, the array needs to be able to inherit something and thats an Object. And arraycopy is a native method.

Arrays are also funny in that they can hold primitives (int, char, double, etc... while the other collections can only hold Objects. Look, for example, at java.util.Arrays and the ugly of the equals methods. This was put in as an after thought. deepEquals(Object[], Object[]) wasn't added until 1.5 while the rest of the Arrays class was added in 1.2.

Because these objects are arrays, they let you do some things that are at the memory or near memory level - something that Java often hides from the coder. This allows certain things to be done faster at the expense of mostly breaking the object model.

There was a trade off early in the system between flexibility and some performance. Performance won out and the lack of flexibility was wrapped in the various collections. Arrays in Java are a thinly implemented Object on top of a primitive type (originally) intended for working with the system when you need it.

For the most part, raw arrays were things that it appears that the original designers tried to ignore and tuck away only in the system. And they wanted it to be fast (early Java had some issues with speed). It was a wart on the design that arrays aren't nice Arrays, but its one that was needed when you wanted to expose something as close to the system as possible. For that matter, the contemporary languages of early Java also have this wart - one can't do a .equals() on C++'s array.

Java and C++ both took the same path for arrays - an external library that does the operations as needed on arrays rather than Arrays... and suggesting the coders to use better, native types unless they really know what they are doing and why they are doing it that way.

Thus, the approach implanting .equals in an array is wrong, but its the same wrong that coders coming from C++ knew of. So chose the least wrong thing in terms of the performance - leave it as the implementation of Object: two Objects are equal if and only if they are referring to the same object.

You need the array to be a primitive like structure for being able to communicate with native bindings - something as close to the classic C array as possible. But unlike the other primitives, you need the array to be able to be passed as a reference, and thus an Object. So its more of a primitive with some Object hacks on the side and some bounds checking.