C# – Use a HashSet or List for a Collection of Entities

cdomain-driven-design

I have a class, which looks like this:

public class Customer
{
    private readonly IList<Order> _orders = new List<Order>();

    public FirstName FirstName { get; set; }
    public LastName LastName { get; set; }
    public Province Province { get; set; }
    public IEnumerable<Order> Orders 
    {
        get { foreach (var order in _orders) yield return order; }
    }

    internal void AddOrder(Order order)
    {
        _orders.Add(order);
    }
    //Planning to add an AddRange method to add a collection of Orders.
}  

Someone suggested to me recently that I should be using a Set instead of a list as I do not need the index benefits of a list as it is exposed as an IEnumberable.

If I change this to a set, then how do I implement equality? I believe I have a few options:

1) Leave it as a list.

2) Do nothing – then I believe orders are unique based on referential equality. Is there any risk doing this? Everywhere I read tells me that you must override Hashcode and equals if you are using a Hashset. Here is an example of a Set, where the default Object.hashcode and object.equals appears to be used with a HashSet: https://github.com/nhibernate/nhibernate-core/blob/master/src/NHibernate.DomainModel/Northwind/Entities/Order.cs

3) Override .equals and .hashcode in the Order class so that Orders are equal if they have the same ID. This link suggests that you should not do this: https://www.youtube.com/watch?v=xRCOKKUSp9s

4) Create an Entity base class similar to this: https://github.com/VaughnVernon/IDDD_Samples_NET/blob/master/iddd_common/Domain.Model/Entity.cs. The youtube video in point three appears to advise against this.

5) Implement an IEqualityComparer. The research I have done suggests this is a bad idea because: 1) I will have to inject/pass a comparer into the entity and 2) The code for establising whether two orders are equal is in a different class to Order making the domain model anemic.

I am trying to follow the principle of least astonishment and find myself going round in circles sometimes trying to achieve this. A lot of the links above are several years old. Is there a standard way to approach this?

Best Answer

Allow me to approach the answer from different angles.

Premature optimization

You have been told that there're other collections whom, apparently, are more suitable your case. It's probably true, but I don't find the given arguments to be enough to make me believe that List is utterly out of place.

Ask yourself whether HashSet solves a real and known issue of the current implementation. Ask as well if these unused List's capabilities are counterproductive. Finally, evaluate whether the effort of changing the implementation makes a big improvement. But don't spend too much time.

Wasting too much time answering these questions (or refactoring prematurely) instead of solving the problem, diminish the benefits of one or another implementation. Solving problems you don't have is premature optimization.

As @MetaFight says, your time is more valuable than CPU cycles.

Equality

What Equality means should be up to you to decide. The ubiquitous language has much to say here. If not, ask the domain experts when two Orders are equals and when they are exactly the same.

Checking these conditions should not depend on the collection type you choose. One type can help you out to achieve it easier than other, but not at the cost of condition the design.

Take the following in mind. Equals and Same are different things.

Two references could be:

  • pointing to the same instance (in memory)

  • pointing to different instances whose data are equals. What causes both to have the same hashcode.

  • pointing to different instances, with different data and still have the same hashcode (collision). Rare, but possible.

  • pointing to different instances, with different data and they to be considered equals by the domain, but not by the collection.

Delegating the control of Equality and Same to the collection might not be enough for you to express the business rules.

For example, if having the same Order twice in the same collection is not allowed but having two equals is, would be good to make this constraint explicit with your own classes and functions3, so that other developers can "read" what equal/same mean and when these characteristics are important.

The reason for all of this is simple. Expressiveness.2

Implementing code to express the differences between "same" and "equals" 1 follows the principle of least astonishment since it's not required for us to be familiar with the SDK.

Don't make us look in the SDK for us to know what are you trying to do.

Testing

Short story long. If we delegate the responsibility to the collection, when it comes to testing, we would be testing the collection instead of the business rules.

KISS

One collection could be more suitable than others. But, if List meet the needs and it's the simplest solution, then go with List. What collection to use is technical detail. Ideally, the domain model is agnostic to this sort of details. If it's, it's easier for you to change the implementation from one collection to another.


1: Or any other constraint of the domain

2: I suggest you to read @VoiceOfUnreason's answer for more insights regarding this subject

3: or other elements of the domain

Related Topic