Domain-Driven Design – How to Work with Large Aggregate Roots

domain-driven-design

I'm learning DDD and yet I have more questions than answers.

Let's consider a model of a directory containing enormous number of files.
Here is how I see it:

Directory is an Aggregate root.
This entity should have the validation logic of checking file name uniqueness when it is added or just renamed. And File entity contains the 'SetName' logic, notifying Directory via Domain Event about name changes.
But how should Directory then work?
It is not always possible to load all files into memory. Should in this case Files repository have adhoc logic for checking name uniqueness? I suppose it is a viable decision.
However, what if some files have been already added or renamed withing current not yet commited transaction? (nothing prohibits that. Transaction boundaries are set externally in relation to business logic). Probably repository should take into account both in-memory and persisted states (merging these states can be nontrivial task.)

So, when aggregate root with all its children fits in memory – everything is fine.
And as soon as you can not materialize all entities there are troubles.

I'd like to know what are the approaches for such situations.
May be there is no problem at all and it is just because of my misunderstanding of the subject.

Best Answer

My answer is biased with Vaughn Vernon's Implementing Domain Driven Design great book (a must read)

1. Favor small aggregates.

If I'm to model your domain, I would model a Directory as an aggregate and File as another aggregate.

2. Reference aggregates by ids.

Therefore Directory will have a collection of FileId value objects.

3. Use factories to create aggregates.

For a simple case a factory method may be enough Directory.addFile(FileName fileName). However, for more complex cases I would use a domain factory.
The domain factory could validate that the fileName is unique using a FileRepository and a UniquefileNameValidator infrastructure service.

Why model File as a separate aggregate?

Because Directories aren't made of Files. a File is associated with a certain Directory. Also, think of a directory that has thousands of files. Loading all these objects into memory each time a directory is fetched is a performance killer.

Model your aggregates according to your use cases. If you know that there will never be more than 2-3 files in a directory then you can model them all as a single aggregate, but in my experience business rules change all the time and it pays if your model was flexible enough to accommodate the changes.

Obligatory read Effective Aggregate Design by Vaughn Vernon

Related Solutions

DDD – Efficiently Handling Aggregate Roots and Concurrency

Could Vote, in this case be considered an "aggregate" and be deserving of its own repository and aggregate status?

I think this might be the right answer. An aggregate should be a transactional consistency boundary. Is there a consistency requirement between votes on a match? The presents of a Vote collection on a Match aggregate would suggest that there is. However, it seems like one vote has nothing to do with the next.

Instead, I would store each vote individually. This way you can use the aggregate functionality of MongoDB to get the count, though I'm not sure whether it is still slow. If it is, then you can aggregate using the Map/Reduce functionality.

More generally, this may not be a best fit for DDD. If the domain doesn't consist of complex behavior there is hardly a reason to try to adapt the DDD tactical patterns (entity, agreggate) to this domain.

Domain-Driven Design – Should the Repository Be in the Domain Object or Service Layer?

You can take either approach and have it work well - there are, of course, pros and cons.

Entity Framework is definitely intended to suffuse your domain entities. It does work well when your domain entities and data entities are the same classes. It's much nicer if you can rely on EF to keep track of the changes for you, and just call context.SaveChanges() when you're finished with your transactional work. It also means that your validation attributes don't have to be set twice, once on your domain models and once on your persisted entities - things like [Required] or [StringLength(x)] can be checked in your business logic, allowing you to catch invalid data states before you try to do a DB transaction and get an EntityValidationException. Finally, it's quick to code - you don't need to write a mapping layer or repository, but can instead work directly with the EF context. It's already a repository and a unit of work, so extra layers of abstraction don't accomplish anything.

A downside to combining your domain and persisted entities is that you end up with a bunch of [NotMapped] attributes scattered throughout your properties. Often, you will want domain-specific properties which are either get-only filters on persisted data, or are set later in your business logic and not persisted back into your database. Some times, you'll want to express your data in your domain models in a way that doesn't work very well with a database - for example, when using enums, Entity will map these to an int column - but perhaps you want to map them to a human-readable string, so you don't need to consult a lookup when examining the database. Then you end up with a string property which is mapped, an enum property which isn't (but gets the string and maps to the enum), and an API which exposes both! Similarly, if you want to combine complex types (tables) across contexts, you may wind up with a mapped OtherTableId and an unmapped ComplexType property, both of which are exposed as public on your class. This can be confusing for someone who isn't familiar with the project, and unnecessarily bloats your domain models.

The more complex my business logic/domain, the more restrictive or cumbersome I find combining my domain and persisted entities to be. For projects with short deadlines, or that don't express a complex business layer, I feel that using EF entities for both purposes is appropriate, and there's no need to abstract your domain away from your persistence. For projects which need maximum ease-of-extension, or that need to express very complicated logic, I think you're better off separating the two and dealing with the extra persistence complexity.

One trick to avoiding the trouble of manually tracking your entity changes is to store the corresponding persisted entity ID in your domain model. This can be filled automatically by your mapping layer. Then, when you need to persist a change back to EF, retrieve the relevant persistent entity before doing any mapping. Then when you map the changes, EF will detect them automatically, and you can call context.SaveChanges() without having to track them by hand.

public class OrganisationService
{
    public void PersistLicenses(IList<DomainLicenses> licenses) 
    {
        using (var context = new EFContext()) 
        {
            foreach (DomainLicense license in licenses) 
            {
                var persistedLicense = context.Licenses.Find(pl => pl.Id == license.PersistedId);
                MappingService.Map(persistedLicense, license); //Right-left mapping
                context.Update(persistedLicense);
            }
            context.SaveChanges();
        }
    }
}

Best Answer

Related Solutions

DDD – Efficiently Handling Aggregate Roots and Concurrency

Domain-Driven Design – Should the Repository Be in the Domain Object or Service Layer?

Related Topic