While assisting a student with a university project, we worked on a Java exercise provided by the university which defined a class for an address with the fields:
number
street
city
zipcode
And it specified that the equals logic should return true if the number and zip code match.
I was once taught that the equals method should only be doing an exact comparison between the objects (after checking the pointer), which makes some sense to me, but contradicts with the task they were given.
I can see why you would want to override the logic so that you can use things like list.contains()
with your partial matching but I'm wondering if this is considered kosher, and if not why not?
Best Answer
Defining Equality For Two Objects
Equality can be arbitrarily defined for any two objects. There is no strict rule that forbids someone from defining any way they want. However, equality is often defined when it is meaningful for the domain rules of what is being implemented.
It is expected to follow the equivalence relation contract:
In your example, perhaps there is no need to distinguish two addresses that have the same zipcode and number as being different. There are domains that are perfectly reasonable to expect the following code to work:
This can be useful, as you mentioned, for when you do not care about them being different objects - you only care about the values they hold. Perhaps zipcode + street number are enough for you to identify the correct address and the remaining information is "extra", and you don't want that extra information to affect your equality logic.
This could be a perfectly good modeling for a software. Just make sure there is some documentation or unit tests to ensure this behavior and that the public API reflects this use.
Do Not Forget About
hashCode()
One additional detail relevant for implementation is the fact that many languages heavily use the concept of hash code. Those languages, java including, usually assume the following proposition:
From the same link as before:
Note that having the same hashCode does not mean that two objects are equal!
In that sense, when one implements equality, one should also implement a
hashCode()
that follow the property mentioned above. ThishashCode()
is used by data structures for efficiency and guaranteeing upper bounds on the complexity of their operations.Coming up with a good hash code function is hard and an entire topic on itself. Ideally the hashCode of two different objects should be different or have an even distribution among instance occurrences.
But keep in mind that the following simple implementation still fulfills the equality property, even though it is not a "good" hash function:
A more common way of implementing hash code is to use the hash codes of the fields that define your equality and make a binary operation on them. In your example, zipcode and street number. It is often done like:
When Ambiguous, Choose Clarity
Here is where I make a distinction about what one should expect regarding equality. Different people have different expectations regarding equality and if you are looking to follow the Principle of Least Astonishment you can consider other options to better describe your design.
Which of those should be considered equal?
A case could be made for each one of those being true or false. When in doubt, one can define a different relation that is clearer in the context of the domain.
For instance, you could define
isSameLocation(Address a)
:Or in the case of Vectors,
isInRangeOf(Vector v, float range)
:This way, you better describe your design intent for equality, and you avoid breaking future readers expectations regarding what your code actually does. (You can just take a look at all the slightly different answers to see how people's expectations varies regarding the equality relation of your example)