Domain-Driven Design – How to Model Domain Events for Denormalized Read Models

cqrsdomain-driven-designevent-sourcing

I have been working with a software strongly relying on the event sourcing and CQRS patterns for a few time and I'm trying to figure out how to properly model the domain events in relation to the needs of the read models.

Let's try to describe the question I'm asking myself in a more formal way. In an event sourcing application a domain event is an object which represents something that happened in the domain and that is relevant to the application. By definition a domain event should be self contained in the sense that it should contain all the relevant information about what happened in the domain, so that it is immediately usable without asking information to other sources.

Usually, when event sourcing is used along with CQRS, the events are published by using some sort of event-bus (it could be as simple as an in-memory event bus or as complex as a full fledged service bus) so that all the interested denormalizers are able to subscribe to them and update some sort of read model accordingly. In all the examples that I have seen the denormalizer code simply reads the properties contained in the event and operates some sort of CRUD over the read model (usually a relational or a documental database): the only source of truth is the event and no other source of truth is queried in order to get the information needed to update the read model. This makes sense to me, doing this way all the read models are independent and can be rebuilded freely because they don't depend on each others (reading read model A while handling an event in order to update the read model B is considered an anti-pattern).

Provided that all the previous statements are correct (please, let me know if they aren't), here is my doubt: how can I properly design the taxonomy of a domain event (I mean the properties) if I don't know in advance who are all the possible read models of my system (and, in general, we don't due to changing requirements for instance) ?

Just to better clarify the point, imagine an e-commerce domain where users are able to purchase books. One of the most important domain events is the following:

public class BookPurchased
{
  public Guid UserId { get; set; }
  public Guid BookId { get; set; }
  public Datetime PurchaseDate { get; set; }
  public Guid PurchaseId { get; set; }
}

Imagine that at some point in time, the need to support mobile clients arises. In this scenario we need an highly denormalized read model for the client, so that with a single call to a rest api it is able to get all the data it needs to render a detail page about the purchase (ideally we would like a fast api, which can read the data with a single query to a single read model, without the need to aggregate data retrieved with multiple queries).
In this case, the previously designed event is not well suited for a denormalizer in charge of filling the new read model, because information like the full name of the user or the book title, for instance, are missing.
A better shape for the event, in this new scenario, could be the following:

public class BookPurchased
{
  public Guid UserId { get; set; }
  public string UserFirstName { get; set; }
  public string UserLastName { get; set; }
  public Guid BookId { get; set; }
  public string BookTitle { get; set; }
  public Datetime PurchaseDate { get; set; }
  public Guid PurchaseId { get; set; }
}

What is the correct way to go in this kind of scenario ?

Should we model "fat" events containing all the possible information (even if they are not strictly needed for the known application use cases) in advance, so that we can support highly denormalized read models in case we need ?

Putting it in another way, is it fine to include some properties in a domain event class only to serve the need of a single specific read model ?

Or maybe should we opt for small and focused read models (so that we can have thinner domain events and the rebuild process is faster) and use some sort of "smart aggregator" (like GraphQL) when we need to create highly denormalized views of the current state of the system ?

Best Answer

The domain events should contain the data that is required and owned by the Write model/the Aggregate. If the Readmodel may need additional data that is not strictly required by the Aggregate and the Aggregate owns it, then you may add this information to the domain event if this would really make your Readmodel much simpler. For example, you may include the old/previous value of some property.

In your case, the additional information needed by the Readmodel is not owned by the Aggregate (so it seems any way). This means that you should not include it. You would have to pass it in the command as well, all the way down to the domain event. The Aggregate would need to forward it by including in the domain event and this means that the Aggregate is forced to know about other Aggregates, maybe other bounded contexts.

You may however subscribe to the domain events emitted by other Aggregates. In your case, you may subscribe to the UserCreated event emitted by the User Aggregate and you could maintain a private/local state of user names. When the BookPurchased event comes in, you may fetch the UserFirstName and UserLastName from the private state.

Lately, in order to limit the proliferation and duplication of private states in Readmodels, I started to use Queries. The Queries are answered by some canonical readmodels. They may also be pushed, when the answer gets updated. Although I use may own framework for this, for more information, you may take a look at the Axon framework, section Queries.