CQRS – Command Handling Fail Feedback

cqrsdesigndomain-driven-designerror handling

We are developing some context using CQRS approach. We ended up with command handlers emitting events. It seems to be a poor idea for us. However, we can't find any alternative approach. We struggle with two particular group of scenarios:

  1. Creating new aggregate.

Creating an aggregate may result in either success or failure. In the case of success, it is straightforward – the aggregate keeps event AggregateCreated (built within its constructor). But in the case of failure, the aggregate cannot emit/publish anything because it doesn't exist. In this case our command handler emits
CreatingAggregateFailed which we percieve as a domain leak.

  1. Resource not found

The second scenario is concerned with the inability to find a particular resource. For example, we may want to remove non-existing resources. In our implementation, Repository::find() throws the NotFound exception. The exception gets caught in the handler, which emits the AggregateNotFound event.

Based on those events, we build the relevant process managers and responses. But it seems awkward that the domain (or application) events are emitted from outside the aggregates. However, these events also apply to non-existing instances of such aggregate.

Consider the simplest scenario. I dispatch the command AddTeamMember($teamGuid, $userGuid, $role). If the team, user and role exists, but the command violates any of aggregate invariant, the aggregate may register the MemberRejected event. All is good. But the user, team or the given role may not even exist. So there is no aggregate capable of registering an event. I need feedback about this failure to take the appropriate actions (either in process manager, or to inform my command issuer). I consider MembershipRequest aggregate as command reciever. Then I always have a valid aggregate to publish events, and the events are meaningful within the aggregate. But this introduces additional compelxities. I need an intermediate aggregate to handle the "resource not found" exception for each possible command.

I have come up with a new idea. I will illustrate this with code.

Old handler version:

/**
 * @param CreateChannel $command
 */
protected function handleCreate(CreateChannel $command): void
{
    try {
        $channel = new Channel($command->getGuid(), $command->getSymbol(), $command->getLangCodes());
        $this->channelRepository->save($channel);
    } catch (InvalidData $e) {
        $this->eventBus->publish($channel, new ChannelCreationFailed($command->getGuid()));
    }
}

New idea

/**
 * @param CreateChannel $command
 */
protected function handleCreate(CreateChannel $command): void
{
    try {
        $channel = new Channel($command->getGuid());
        $channel->create($command->getSymbol(), $command->getLangCodes()))
        $this->channelRepository->save($channel);
    } catch (InvalidChannelData $e) {
        // ?
    } finally {
        $this->eventBus->publish($channel, new ChannelCreationFailed($command->getGuid()));
    }
}

But this has obvious drawbacks. First of all, service layer affects aggregate design. It is also reinventing object language concepts. It forces you to check within any other method if aggregte is in "created" state. Perhaps it is important to differentiate between object creation and aggregate creation? This approach assumes no exceptions are raised from the constructor. The constructor would always only get valid (guranteed by command) guidance, and nothing else.

Always having aggregate with guidance also solves the issue with non-existing referenced aggregates. Using double dispatch, it is possible to raise the "not found" exceptions from source aggregate.

$team->addMember($userGuid, $usersRepository);

But this makes the aggregate's api ugly compared to

$team->addMember($user);

Best Answer

Don't publish domain events that your domain experts wouldn't care about.

If a specific failure case is an identified part of a business process (typically something that would come up during an Event Storming session with the business people), find a term for it in the Ubiquitous Language and definitely publish an event for it.

But in non-nominal cases such as network interruptions, system outages, configuration mistakes and the like, I wouldn't go through the regular pub-sub cycle. In the case of a command prompted by a user through a UI, notify them that an error occurred. If the command was executed by a process manager, it will know that something went wrong and maybe

  • retry
  • execute a compensation command
  • or notify the admins

depending on the situation. If a compensation command is sent and a new event emitted as a result, correlation ids can help you retrace which original event or command triggered the compensation and with the help of logs, diagnose and fix the problem.

Related Topic