Java Object-Oriented – Is It a Good Idea to Have Logic in the Equals Method That Doesn’t Do Exact Matching?

javaobject-oriented

While assisting a student with a university project, we worked on a Java exercise provided by the university which defined a class for an address with the fields:

number
street
city
zipcode

And it specified that the equals logic should return true if the number and zip code match.

I was once taught that the equals method should only be doing an exact comparison between the objects (after checking the pointer), which makes some sense to me, but contradicts with the task they were given.

I can see why you would want to override the logic so that you can use things like list.contains() with your partial matching but I'm wondering if this is considered kosher, and if not why not?

Best Answer

Defining Equality For Two Objects

Equality can be arbitrarily defined for any two objects. There is no strict rule that forbids someone from defining any way they want. However, equality is often defined when it is meaningful for the domain rules of what is being implemented.

It is expected to follow the equivalence relation contract:

  • It is reflexive: for any non-null reference value x, x.equals(x) should return true.
  • It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
  • It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
  • It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
  • For any non-null reference value x, x.equals(null) should return false.

In your example, perhaps there is no need to distinguish two addresses that have the same zipcode and number as being different. There are domains that are perfectly reasonable to expect the following code to work:

Address a1 = new Address("123","000000-0","Street Name","City Name");
Address a2 = new Address("123","000000-0","Str33t N4me","C1ty N4me");
assert a1.equals(a2);

This can be useful, as you mentioned, for when you do not care about them being different objects - you only care about the values they hold. Perhaps zipcode + street number are enough for you to identify the correct address and the remaining information is "extra", and you don't want that extra information to affect your equality logic.

This could be a perfectly good modeling for a software. Just make sure there is some documentation or unit tests to ensure this behavior and that the public API reflects this use.


Do Not Forget About hashCode()

One additional detail relevant for implementation is the fact that many languages heavily use the concept of hash code. Those languages, java including, usually assume the following proposition:

If x.equals(y) then x.hashCode() and y.hashCode() are the same.

From the same link as before:

Note that it is generally necessary to override the hashCode method whenever this method (equals) is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

Note that having the same hashCode does not mean that two objects are equal!

In that sense, when one implements equality, one should also implement a hashCode() that follow the property mentioned above. This hashCode() is used by data structures for efficiency and guaranteeing upper bounds on the complexity of their operations.

Coming up with a good hash code function is hard and an entire topic on itself. Ideally the hashCode of two different objects should be different or have an even distribution among instance occurrences.

But keep in mind that the following simple implementation still fulfills the equality property, even though it is not a "good" hash function:

public int hashCode() {
    return 0;
}

A more common way of implementing hash code is to use the hash codes of the fields that define your equality and make a binary operation on them. In your example, zipcode and street number. It is often done like:

public int hashCode() {
    return this.zipCode.hashCode() ^ this.streetNumber.hashCode();
}

When Ambiguous, Choose Clarity

Here is where I make a distinction about what one should expect regarding equality. Different people have different expectations regarding equality and if you are looking to follow the Principle of Least Astonishment you can consider other options to better describe your design.

Which of those should be considered equal?

Address a1 = new Address("123","000000-0","Street Name","City Name");
Address a2 = new Address("123","000000-0","Str33t N4me","C1ty N4me");
assert a1.equals(a2); // Are typos the same address?
Address a1 = new Address("123","000000-0","John Street","SpringField");
Address a2 = new Address("123","000000-0","John St.","SpringField");
assert a1.equals(a2); // Are abbreviations the same address?
Vector3 v1 = new Vector3(1.0f, 1.0f, 1.0f);
Vector3 v2 = new Vector3(1.0f, 1.0f, 1.0f);
assert v1.equals(v2); // Should two vectors that have the same values be the same?
Vector3 v1 = new Vector3(1.00000001f, 1.0f, 1.0f);
Vector3 v2 = new Vector3(1.0f, 1.0f, 1.0f);
assert v1.equals(v2); // What is the error tolerance?

A case could be made for each one of those being true or false. When in doubt, one can define a different relation that is clearer in the context of the domain.

For instance, you could define isSameLocation(Address a):

Address a1 = new Address("123","000000-0","John Street","SpringField");
Address a2 = new Address("123","000000-0","John St.","SpringField");

System.out.print(a1.equals(a2)); // false;
System.out.print(a1.isSameLocation(a2)); // true;

Or in the case of Vectors, isInRangeOf(Vector v, float range):

Vector3 v1 = new Vector3(1.000001f, 1.0f, 1.0f);
Vector3 v2 = new Vector3(1.0f, 1.0f, 1.0f);

System.out.print(v1.equals(v2)); // false;
System.out.print(v1.isInRangeOf(v2, 0.01f)); // true;

This way, you better describe your design intent for equality, and you avoid breaking future readers expectations regarding what your code actually does. (You can just take a look at all the slightly different answers to see how people's expectations varies regarding the equality relation of your example)

Related Topic