CQRS Command – How to Validate and Transform to Domain Object

business-logiccqrsdomain-driven-design

I have been adapting poor-man's CQRS¹ for quite some time now because I love its flexibility to have granular data in one data store, providing great possibilities for analysis and thus increasing business value and when needed another for reads containing denormalized data for increased performance.

But unfortunately pretty much from the beginning I have been struggling with the problem where exactly I should place business logic in this type of architecture.

From what I understand, a command is a mean to communicate intent and does not have ties to a domain by itself. They are basically data (dumb – if you wish) transfer objects. This is to make commands easily transferable between different technologies. Same applies to events as responses to successfully completed events.

In a typical DDD application the business logic resides within entities, value objects, aggregate roots, they are rich in both data as well as behavior. But a command is not a domain object thus it should not be limited to domain representations of data, because that puts too much strain on them.

So the real question is: Where exactly is the logic?

I have found out I tend to face this struggle most often when trying to construct a quite complicated aggregate which sets some rules about combinations of its values. Also, when modeling domain objects I like to follow the fail-fast paradigm, knowing when an object reaches a method it's in a valid state.

Let's say an aggregate Car uses two components:

Transmission,
Engine.

Both Transmission and Engine value objects are represented as super types and have according sub types, Automatic and Manual transmissions, or Petrol and Electric engines respectively.

In this domain, living on its own a successfully created Transmission, be it Automatic or Manual, or either type of an Engine is completely fine. But the Car aggregate introduces a few new rules, applicable only when Transmission and Engine objects are used in the same context. Namely:

When a car uses Electric engine the only allowed transmission type is Automatic.
When a car uses Petrol engine it may have either type of Transmission.

I could catch this component combination violation at the level of creating a command, but as I have stated before, from what I understand that should not be done because the command would then contain business logic which should be limited to the domain layer.

One of the options is to move this business logic validation to command validator itself, but this does not seem to be right either. It feels like I would be deconstructing the command, checking its properties retrieved using getters and comparing them within the validator and inspecting results. That screams like a violation of the law of Demeter to me.

Discarding the mentioned validation option because it does not seem viable, it seems like one should use the command and construct the aggregate from it. But where should this logic exist? Should it be within the command handler responsible for handling a concrete command? Or should it perhaps be within the command validator (I don't like this approach either)?

I am currently using a command and create an aggregate from it within the responsible command handler. But when I do this, should I have a command validator it would not contain anything at all, because should the CreateCar command exist it would then contain components which I know are valid on separate cases but the aggregate might say different.

Let's imagine a different scenario mixing different validation processes – creating a new user using a CreateUser command.

The command contains an Id of a users which will have been created and their Email.

The system states the following rules for user's email address:

must be unique,
must not be empty,
must have at most 100 characters (max length of a db column).

In this case, even though having a unique email is a business rule, checking it in an aggregate makes very little sense, because I would need to load the entire set of current emails in the system to a memory and check the email in the command against the aggregate (Eeeek! Something, something, performance.). Because of that, I would move this check to the command validator, which would take UserRepository as a dependency and use the repository to check whether a user with the email present in the command already exists.

When it comes to this it suddenly makes sense to put the other two email rules in the command validator as well. But I have a feeling the rules should be really present within a User aggregate and that the command validator should only check about the uniqueness and if validation succeeds I should proceed to create the User aggregate in the CreateUserCommandHandler and pass it to a repository to be saved.

I feel like this because the repository's save method is likely to accept an aggregate which ensures that once the aggregate is passed all invariants are fulfilled. When the logic (e.g. the non-emptiness) is only present within the command validation itself another programmer could completely skip this validation and call the save method in the UserRepository with a User object directly which could lead to a fatal database error, because the email might have been too long.

How do you personally handle these complex validations and transformations? I am mostly happy with my solution, but I feel like I need affirmation that my ideas and approaches are not completely stupid to be pretty happy with the choices. I am entirely open to completely different approaches. If you have something you have personally tried and worked very well for you I would love to see your solution.

¹ Working as a PHP developer responsible for creating RESTful systems my interpretation of CQRS deviates a little from the standard async-command-processing approach, such as sometimes returning results from commands due to the need of processing commands synchronously.

Best Answer

The following answer is in the context of the CQRS style promoted by the cqrs.nu in which commands arrive directly on the aggregates. In this architectural style the application services are being replaced by an infrastructure component (the CommandDispatcher) that identifies the aggregate, loads it, sends it the command and then persists the aggregate (as a series of events if Event sourcing is used).

So the real question is: Where exactly is the logic?

There are multiple kinds of (validation) logic. The general idea is to execute the logic as early as possible - fail fast if you want. So, the situations are as follows:

the structure of the command object itself; the command's constructor has some required fields that must be present for the command to be created; this is the first and fastest validation; this is obviously contained in the command.
low level field validation, like the non-emptiness of some fields (like the username) or the format (a valid email address). This kind of validation should be contained inside the command itself, in the constructor. There is another style of having an isValid method but this seems pointless to me as someone would have to remember to call this method when in fact successful command instantiation should suffice.
separate command validators, classes that have the responsibility to validated a command. I use this kind of validation when I need to check information from multiple aggregates or external sources. You could use this to check the uniqueness of an username. Command validators could have any dependencies injected, like repositories. Keep in mind that this validation is eventually consistent with the aggregate (i.e. when the user gets created, another user with the same username could be created in the meantime)! Also, do not try to put here logic that should reside inside the aggregate! Command validators are different from the Sagas/Process managers which generate commands based on events.
the aggregate methods that receive and process the commands. This is the last (kind of) validation that occurs. The aggregate extract the data from the command and using some core business logic it accepts (it performs changes to it's state) or rejects it. This logic is checked in a strong consistent manner. This is the last line of defense. In your example, the rule When a car uses Electric engine the only allowed transmission type is Automatic should be checked here.

I feel like this because the repository's save method is likely to accept an aggregate which ensures that once the aggregate is passed all invariants are fulfilled. When the logic (e.g. the non-emptiness) is only present within the command validation itself another programmer could completely skip this validation and call the save method in the UserRepository with a User object directly which could lead to a fatal database error, because the email might have been too long.

Using the above techniques nobody can create invalid commands or bypass the logic inside the aggregates. Command validators are automatically loaded+called by the CommandDispatcher so nobody can send a command directly to the aggregate. One could call a method on the aggregate passing a command but could not persist the changes so it would be pointless/harmless to do so.

Working as a PHP developer responsible for creating RESTful systems my interpretation of CQRS deviates a little from the standard async-command-processing approach, such as sometimes returning results from commands due to the need of processing commands synchronously.

I'm also a PHP programmer and I don't return anything from my command handlers (aggregate methods in the form handleSomeCommand). I do, however, quite often, return information to the client/browser in the HTTP response, for example the ID of the newly created aggregate root or something from a read-model but I never return (really never) anything from my aggregate command methods. The simple fact that the command was accepted (and processed - we are talking about synchronous PHP processing, right?!) is sufficient.

We return something to the browser (and still doing CQRS by the book) because CQRS is not a high level architecture.

An example of how command validators work:

Related Solutions

Domain-Driven Design – Should the Repository Be in the Domain Object or Service Layer?

You can take either approach and have it work well - there are, of course, pros and cons.

Entity Framework is definitely intended to suffuse your domain entities. It does work well when your domain entities and data entities are the same classes. It's much nicer if you can rely on EF to keep track of the changes for you, and just call context.SaveChanges() when you're finished with your transactional work. It also means that your validation attributes don't have to be set twice, once on your domain models and once on your persisted entities - things like [Required] or [StringLength(x)] can be checked in your business logic, allowing you to catch invalid data states before you try to do a DB transaction and get an EntityValidationException. Finally, it's quick to code - you don't need to write a mapping layer or repository, but can instead work directly with the EF context. It's already a repository and a unit of work, so extra layers of abstraction don't accomplish anything.

A downside to combining your domain and persisted entities is that you end up with a bunch of [NotMapped] attributes scattered throughout your properties. Often, you will want domain-specific properties which are either get-only filters on persisted data, or are set later in your business logic and not persisted back into your database. Some times, you'll want to express your data in your domain models in a way that doesn't work very well with a database - for example, when using enums, Entity will map these to an int column - but perhaps you want to map them to a human-readable string, so you don't need to consult a lookup when examining the database. Then you end up with a string property which is mapped, an enum property which isn't (but gets the string and maps to the enum), and an API which exposes both! Similarly, if you want to combine complex types (tables) across contexts, you may wind up with a mapped OtherTableId and an unmapped ComplexType property, both of which are exposed as public on your class. This can be confusing for someone who isn't familiar with the project, and unnecessarily bloats your domain models.

The more complex my business logic/domain, the more restrictive or cumbersome I find combining my domain and persisted entities to be. For projects with short deadlines, or that don't express a complex business layer, I feel that using EF entities for both purposes is appropriate, and there's no need to abstract your domain away from your persistence. For projects which need maximum ease-of-extension, or that need to express very complicated logic, I think you're better off separating the two and dealing with the extra persistence complexity.

One trick to avoiding the trouble of manually tracking your entity changes is to store the corresponding persisted entity ID in your domain model. This can be filled automatically by your mapping layer. Then, when you need to persist a change back to EF, retrieve the relevant persistent entity before doing any mapping. Then when you map the changes, EF will detect them automatically, and you can call context.SaveChanges() without having to track them by hand.

public class OrganisationService
{
    public void PersistLicenses(IList<DomainLicenses> licenses) 
    {
        using (var context = new EFContext()) 
        {
            foreach (DomainLicense license in licenses) 
            {
                var persistedLicense = context.Licenses.Find(pl => pl.Id == license.PersistedId);
                MappingService.Map(persistedLicense, license); //Right-left mapping
                context.Update(persistedLicense);
            }
            context.SaveChanges();
        }
    }
}

DDD CQRS – per-query and per-command authorization

For the first question I've been struggling with something similar. More and more I'm leaning towards a three-phased authorization scheme:

1) Authorization at the command/query level of "does this user ever have permission to execute this command?" In an MVC app this could probably be handled at the controller level, but I'm opting for a generic pre-handler that will query the permissions store based on the current user and the executing command.

2) Authorization inside the application service of "does this user "ever* has permission to access this entity?" In my case this will probably end up being an implicit check simply by means of filters on the repository -- in my domain this is basically a TenantId with a little more granularity of OrganizationId.

3) Authorization that relies on a transient properties of your entities (such as Status) would be handled inside of the domain. (Ex. "Only certain people can modify a closed ledger.") I'm opting to put that inside the domain because it it relies heavily on the domain and business logic and I'm not really comfortable exposing that in other places.

I'd love to hear others' responses to this idea -- tear it to shreds if you want (just provide some alternatives if you do :) )

Best Answer

Related Solutions

Domain-Driven Design – Should the Repository Be in the Domain Object or Service Layer?

DDD CQRS – per-query and per-command authorization

Related Topic