Domain-Driven Design – How to Handle Failing Messages in DDD?

domain-driven-designerror handlingmessage-queue

We have a set of micro-services all built according to the Domain Driven Design (DDD). The micro-services communicate via Domain Events with each other (OrderSubmittedEvent, CustomerBilledEvent, …). We implemented everything with Spring Boot and use JMS with ActiveMQ as the message broker. All events are published to a topic and every micro-service can implement a JMS listener for an event of interest.

But for error handling, we heavily rely on ActiveMQ. As I said, all events are sent to a topic. However, we configured ActiveMQ to use Virtual Topics for the listeners, which means that ActiveMQ will create a separate queue for each listener and copy the message from the topic to the queue. If a listener fails, the message is sent to a Dead Letter Queue (DLQ). This helps us, because we can monitor the DLQ, see errors, fix them and put the message back to the queue of the listener which has failed for another try.

We have a couple of problems with that:

  • We end up with a lot of queues as each and every listener will get its own queue.
  • We cannot switch the message broker as we rely on ActiveMQ's Virtual Topics.
  • We cannot use ActiveMQ out-of-the box as we must configure it always. Our client for which we develop the software does the configuration manually in his system. As we have so many queues and little pitfalls, it's very cumbersome for him.

Well, we don't have any better idea, that's why I'm reaching out to you. How can we handle failing messages? What's your way?

Best Answer

One way to deal with this is to use pull-based listeners instead of push-based. Each listener keeps track of its last read message and can request "messages since X". You could use either polling or a notification when a new event comes in or both to trigger message requests. No more queues, but the listener will need to store its last processed message id. You also need to store the events in a way that preserves ordering to support listener requests.

Whenever a listener fails to process an event, it can log the failure to whatever logging infrastructure you have setup. Then you will observe or be notified of the failure by your monitoring infrastructure. Once you fix that particular listener and redeploy it, the listener will start back where it left off.

Related Topic