My project has a integration with a external system. We need to send some important informations to this system. For this, we create a micro service to connect to this external system. This micro service receive async messages from our internal systems in a dedicated Queue for that, and the micro service read the messages and try to send them to the external system. Very simple.

Before explain my doubt, I will put you guys on the context.

Context

We are not sure if this external system is reliable. Sometimes the external system will be off and we need some way to recover from this. So we implement a retry mechanism. But we also imagine that this external system could be off for a lot of time and maybe even reject some valid messages until we talk with the support team. So, we also create a durable Dead Letter Queue to receive this rejected messages.

In the worst scenario, it will be a lot of messages on the DLQ queue to analyze the cause of why they are rejected. So my team have the idea to pull the messages from the DLQ queue and persist them as a Json in a database, because we imagine that will be necessary to edit them.

The question

And my question is about this last paragraph: save messages on database. I'm not sure if is a good idea, seems to me that we are just replicating the durable feature from the Queue on a database.

The idea came from the fact that the messages on the database will be easier to analyze (maybe we will need to do that), edit them (maybe we will need, probably not) and have a more reliable way to manipulate the messages, because one mistake on the RabbitMQ Management and the message is gone. We will also will need some kind of cron job just to read this messages from the database and send them again for the main Queue.

As you see, there are a lot of maybe and a lot of extra work (database, tables, cron job, etc), and I'm not really comfortable with the solution. Of course we will be more safe if the messages could be analyzed and edited on a relational database, but I'm not sure if we will use this feature on the future.

I made some research about the use of RabbitMQ for rejected messages that could be analyzed for the team and sent back again to the main Queue, but this is a not a common scenario.

Best Answer

The proper term for the issue you are addressing is 'poison message processing'. Dead letter queues are typically used by messaging systems for undeliverable messages. A poison message is one that a client application cannot successfully process. A 'poison message queue' is a queue that is configured to receive these messages in order to prevent a message from being read and rolled-back indefinitely. This is good practice in order to prevent one bad request from causing messages to back up and exceeding queue depth which will start to cause upstream failures.

In the specific case that you are concerned with where a downstream application is offline, one easy solution is to stop processing messages on the queue. There are a couple of issues with this. The main one is that if you have a high volume of messages coming in, you could fill the queue. Another potential issue is that if that queue is not persistent, having a lot of messages sitting in it puts you at risk for data loss.

My experience with messaging solutions leads me to this general recommendation: do not rely on queues for preventing data loss. No matter what the vendor claims about 'guaranteed delivery' there are so many pitfalls and challenges in making such a system leak-proof. If you need to make sure something arrives, you need to have persistent store where you keep enough information to be able to regenerate a message in the case of loss. Often this is in a database. This is similar to what you had proposed but instead of pushing messages to a DB when they are being read off of the queue, you persist to a DB at the point of origination. In addition, I would recommend a ledger-type approach of recording (at least) the final state of each record in that store. If you need to purge data, you can do so in bulk later on.

It might seem that if you do this, queues are irrelevant but this is not the case. Queues are a great way to move and process messages in a distributed system and avoid contention. By using a DB ledger with queues, you get the best of both worlds.

RabbitMQ DLQ Messages – Save to Database for Analysis

Context

The question

Best Answer

Related Topic

Context

The question

Best Answer

Related Solutions

Domain-Driven Design – How to Handle Failing Messages in DDD?

Fair distributed task scheduling with RabbitMQ

Related Topic