C# – Overriding GetHashCode for mutable objects

cequalsgethashcodenetoverriding

I've read about 10 different questions on when and how to override GetHashCode but there's still something I don't quite get. Most implementations of GetHashCode are based on the hash codes of the fields of the object, but it's been cited that the value of GetHashCode should never change over the lifetime of the object. How does that work if the fields that it's based on are mutable? Also what if I do want dictionary lookups etc to be based on reference equality not my overridden Equals?

I'm primarily overriding Equals for the ease of unit testing my serialization code which I assume serializing and deserializing (to XML in my case) kills the reference equality so I want to make sure at least it's correct by value equality. Is this bad practice to override Equals in this case? Basically in most of the executing code I want reference equality and I always use == and I'm not overriding that. Should I just create a new method ValueEquals or something instead of overriding Equals? I used to assume that the framework always uses == and not Equals to compare things and so I thought it was safe to override Equals since it seemed to me like its purpose was for if you want to have a 2nd definition of equality that's different from the == operator. From reading several other questions though it seems that's not the case.

EDIT:

It seems my intentions were unclear, what I mean is that 99% of the time I want plain old reference equality, default behavior, no surprises. For very rare cases I want to have value equality, and I want to explicitly request value equality by using .Equals instead of ==.

When I do this the compiler recommends I override GetHashCode as well, and that's how this question came up. It seemed like there's contradicting goals for GetHashCode when applied to mutable objects, those being:

If a.Equals(b) then a.GetHashCode() should == b.GetHashCode().
The value of a.GetHashCode() should never change for the lifetime of a.

These seem naturally contradicting when a mutable object, because if the state of the object changes, we expect the value of .Equals() to change, which means that GetHashCode should change to match the change in .Equals(), but GetHashCode should not change.

Why does there seem to be this contradiction? Are these recommendations not meant to apply to mutable objects? Probably assumed, but might be worth mentioning I'm referring to classes not structs.

Resolution:

I'm marking JaredPar as accepted, but mainly for the comments interaction. To sum up what I've learned from this is that the only way to achieve all goals and to avoid possible quirky behavior in edge cases is to only override Equals and GetHashCode based on immutable fields, or implement IEquatable. This kind of seems to diminish the usefulness of overriding Equals for reference types, as from what I've seen most reference types usually have no immutable fields unless they're stored in a relational database to identify them with their primary keys.

Best Answer

How does that work if the fields that it's based on are mutable?

It doesn't in the sense that the hash code will change as the object changes. That is a problem for all of the reasons listed in the articles you read. Unfortunately this is the type of problem that typically only show up in corner cases. So developers tend to get away with the bad behavior.

Also what if I do want dictionary lookups etc to be based on reference equality not my overridden Equals?

As long as you implement an interface like IEquatable<T> this shouldn't be a problem. Most dictionary implementations will choose an equality comparer in a way that will use IEquatable<T> over Object.ReferenceEquals. Even without IEquatable<T>, most will default to calling Object.Equals() which will then go into your implementation.

Basically in most of the executing code I want reference equality and I always use == and I'm not overriding that.

If you expect your objects to behave with value equality you should override == and != to enforce value equality for all comparisons. Users can still use Object.ReferenceEquals if they actually want reference equality.

I used to assume that the framework always uses == and not Equals to compare things

What the BCL uses has changed a bit over time. Now most cases which use equality will take an IEqualityComparer<T> instance and use it for equality. In the cases where one is not specified they will use EqualityComparer<T>.Default to find one. At worst case this will default to calling Object.Equals

The theory (for the language lawyers and the mathematically inclined):

equals() (javadoc) must define an equivalence relation (it must be reflexive, symmetric, and transitive). In addition, it must be consistent (if the objects are not modified, then it must keep returning the same value). Furthermore, o.equals(null) must always return false.

hashCode() (javadoc) must also be consistent (if the object is not modified in terms of equals(), it must keep returning the same value).

The relation between the two methods is:

Whenever a.equals(b), then a.hashCode() must be same as b.hashCode().

In practice:

If you override one, then you should override the other.

Use the same set of fields that you use to compute equals() to compute hashCode().

Use the excellent helper classes EqualsBuilder and HashCodeBuilder from the Apache Commons Lang library. An example:

public class Person {
    private String name;
    private int age;
    // ...

    @Override
    public int hashCode() {
        return new HashCodeBuilder(17, 31). // two randomly chosen prime numbers
            // if deriving: appendSuper(super.hashCode()).
            append(name).
            append(age).
            toHashCode();
    }

    @Override
    public boolean equals(Object obj) {
       if (!(obj instanceof Person))
            return false;
        if (obj == this)
            return true;

        Person rhs = (Person) obj;
        return new EqualsBuilder().
            // if deriving: appendSuper(super.equals(obj)).
            append(name, rhs.name).
            append(age, rhs.age).
            isEquals();
    }
}

Also remember:

When using a hash-based Collection or Map such as HashSet, LinkedHashSet, HashMap, Hashtable, or WeakHashMap, make sure that the hashCode() of the key objects that you put into the collection never changes while the object is in the collection. The bulletproof way to ensure this is to make your keys immutable, which has also other benefits.

C# – Deep cloning objects

Whereas one approach is to implement the ICloneable interface (described here, so I won't regurgitate), here's a nice deep clone object copier I found on The Code Project a while ago and incorporated it into our code. As mentioned elsewhere, it requires your objects to be serializable.

using System;
using System.IO;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;

/// <summary>
/// Reference Article http://www.codeproject.com/KB/tips/SerializedObjectCloner.aspx
/// Provides a method for performing a deep copy of an object.
/// Binary Serialization is used to perform the copy.
/// </summary>
public static class ObjectCopier
{
    /// <summary>
    /// Perform a deep copy of the object via serialization.
    /// </summary>
    /// <typeparam name="T">The type of object being copied.</typeparam>
    /// <param name="source">The object instance to copy.</param>
    /// <returns>A deep copy of the object.</returns>
    public static T Clone<T>(T source)
    {
        if (!typeof(T).IsSerializable)
        {
            throw new ArgumentException("The type must be serializable.", nameof(source));
        }

        // Don't serialize a null object, simply return the default for that object
        if (ReferenceEquals(source, null)) return default;

        using var Stream stream = new MemoryStream();
        IFormatter formatter = new BinaryFormatter();
        formatter.Serialize(stream, source);
        stream.Seek(0, SeekOrigin.Begin);
        return (T)formatter.Deserialize(stream);
    }
}

The idea is that it serializes your object and then deserializes it into a fresh object. The benefit is that you don't have to concern yourself about cloning everything when an object gets too complex.

In case of you prefer to use the new extension methods of C# 3.0, change the method to have the following signature:

public static T Clone<T>(this T source)
{
   // ...
}

Now the method call simply becomes objectBeingCloned.Clone();.

EDIT (January 10 2015) Thought I'd revisit this, to mention I recently started using (Newtonsoft) Json to do this, it should be lighter, and avoids the overhead of [Serializable] tags. (NB @atconway has pointed out in the comments that private members are not cloned using the JSON method)

/// <summary>
/// Perform a deep Copy of the object, using Json as a serialization method. NOTE: Private members are not cloned using this method.
/// </summary>
/// <typeparam name="T">The type of object being copied.</typeparam>
/// <param name="source">The object instance to copy.</param>
/// <returns>The copied object.</returns>
public static T CloneJson<T>(this T source)
{            
    // Don't serialize a null object, simply return the default for that object
    if (ReferenceEquals(source, null)) return default;

    // initialize inner objects individually
    // for example in default constructor some list property initialized with some values,
    // but in 'source' these items are cleaned -
    // without ObjectCreationHandling.Replace default constructor values will be added to result
    var deserializeSettings = new JsonSerializerSettings {ObjectCreationHandling = ObjectCreationHandling.Replace};

    return JsonConvert.DeserializeObject<T>(JsonConvert.SerializeObject(source), deserializeSettings);
}