Design – Handle failures in Event Driven Architecture

designdomain-driven-designeventevent handlingfailure

Suppose you have a bounded context with ~30 business events and, for simplicity sake, the same number of commands like ChangeUserEmailCommand -> UserEmailChangedEvent initiated from a web UI. Processing a command may fail for the following main reasons (besides infrastructure failures of course):

Validation issue (email uniqueness)
Technical issue (optimistic concurrency version mismatch)

I'd be interested to provide the best user experience to the clients and display what went wrong.

What is the best practice to signal the failures?

Would you create 30 more events like ChangeUserEmailFailedEvent? If not, what's your rule of thumb for which events to create a paired *FailedEvent?
Is it a good idea to just have bool Success {get;set;} property in the existing events? It's probably not the best way when you need to signal more failure details than just an error message
Would you create a single ConcurrencyFailedEvent for all concurrency issues adding a source command type as part of it's payload? Just to separate this kind of failure from business validation failures?

The commands are processed asynchronously (via a broker). The read storage is separated from the write storage. No event sourcing.

As for why would I need this I can think of the following:

Detailed error message pushed back to a client via web sockets, for example
Threat detection – reacting to an increased number of failed user registration which might be an attack
Monitoring – displaying a number of failed orders on a dashboard, for example. If it's within a certain range I'd feel safe letting the support handle it. If it's above a certain number – I probably need to dig the logs.

Best Answer

Seems to me like you can get away with a single type

ErrorEvent
    EventId
    ErredEventId
    ErrorType
    Message

If you just have generic errors to deal with, but I would go further and remove Error events for UI stuff.

If the user is waiting to see if something erred you can pass the error back from the function they called and only write the event on success.

So, just EmailUpdated instead of EmailUpdateRequest EmailUpdateFailed EmailUpdated etc etc

You can see how going fully event driven can explode the number of types you need. If they are all internal to a thick client, then you have compile time checking to handle it all, but if you have to pass them over a distributed system it becomes ridiculous

Best Answer

Related Solutions

CQRS and Event Sourcing – Command and Domain Event Communication

The 80% case

Why not publish?

Example

Root cause analysis in event correlation

Related Topic