Domain-Driven Design – How to Work with Large Aggregate Roots

domain-driven-design

I'm learning DDD and yet I have more questions than answers.

Let's consider a model of a directory containing enormous number of files.
Here is how I see it:

Directory is an Aggregate root.
This entity should have the validation logic of checking file name uniqueness when it is added or just renamed. And File entity contains the 'SetName' logic, notifying Directory via Domain Event about name changes.
But how should Directory then work?
It is not always possible to load all files into memory. Should in this case Files repository have adhoc logic for checking name uniqueness? I suppose it is a viable decision.
However, what if some files have been already added or renamed withing current not yet commited transaction? (nothing prohibits that. Transaction boundaries are set externally in relation to business logic). Probably repository should take into account both in-memory and persisted states (merging these states can be nontrivial task.)

So, when aggregate root with all its children fits in memory – everything is fine.
And as soon as you can not materialize all entities there are troubles.

I'd like to know what are the approaches for such situations.
May be there is no problem at all and it is just because of my misunderstanding of the subject.

Best Answer

My answer is biased with Vaughn Vernon's Implementing Domain Driven Design great book (a must read)

1. Favor small aggregates.

If I'm to model your domain, I would model a Directory as an aggregate and File as another aggregate.

2. Reference aggregates by ids.

Therefore Directory will have a collection of FileId value objects.

3. Use factories to create aggregates.

For a simple case a factory method may be enough Directory.addFile(FileName fileName). However, for more complex cases I would use a domain factory.
The domain factory could validate that the fileName is unique using a FileRepository and a UniquefileNameValidator infrastructure service.

Why model File as a separate aggregate?

Because Directories aren't made of Files. a File is associated with a certain Directory. Also, think of a directory that has thousands of files. Loading all these objects into memory each time a directory is fetched is a performance killer.

Model your aggregates according to your use cases. If you know that there will never be more than 2-3 files in a directory then you can model them all as a single aggregate, but in my experience business rules change all the time and it pays if your model was flexible enough to accommodate the changes.

Obligatory read Effective Aggregate Design by Vaughn Vernon