As the other answers mentioned, you can't prevent duplicate messages coming through from SQS.
Most of the time your messages will be handed to one of your consumers once, but you will run into duplicates at some stage.
I don't think there is an easy answer to this question, because it entails coming up with a proper architecture that can cope with duplicates, meaning it's idempotent in nature.
If all the workers in your distributed architecture were idempotent, it would be easy, because you wouldn't need to worry about duplicates. But in reality, that sort of environment does not exist, somewhere along the way something will not be able to handle it.
I am currently working on a project where it's required of me to solve this, and come up with an approach to handle it. I thought it might benefit others to share my thinking here. And it might be a good place to get some feedback on my thinking.
Fact store
It's a pretty good idea to develop services so that they collect facts which can theoretically be replayed to reproduce the same state in all the affected downstream systems.
For example, let's say you are building a message broker for a stock trading platform. (I have actually worked on a project like this before, it was horrible, but also a good learning experience.)
Now let's say that that trades come in, and there are 3 systems interested in it:
- An old school mainframe which needs to stay updated
- A system that collates all the trades and share it with partners on a FTP server
- The service that records the trade, and reallocates shares to the new owner
It's a bit convoluted, I know, but the idea is that one message (fact) coming in, has various distributed downstream effects.
Now let's imagine that we maintain a fact store, a recording of all the trades coming into our broker. And that all 3 downstream service owners calls us to tell us that they have lost all of their data from the last 3 days. The FTP download is 3 days behind, the mainframe is 3 days behind, and all the trades are 3 days behind.
Because we have the fact store, we could theoretically replay all these messages from a certain time to a certain time. In our example that would be from 3 days ago until now. And the downstream services could be caught up.
This example might seem a bit over the top, but I'm trying to convey something very particular: the facts are the important things to keep track of, because that's where we are going to use in our architecture to battle duplicates.
How the Fact store helps us with duplicate messages
Provided you implement your fact store on a persistence tier that gives you the CA parts of the CAP theorem, consistency and availability, you can do the following:
As soon as a message is received from a queue, you check in your fact store whether you've already seen this message before, and if you have, whether it's locked at the moment, and in a pending state. In my case, I will be using MongoDB to implement my fact store, as I am very comfortable with it, but various other DB technologies should be able to handle this.
If the fact does not exist yet, it gets inserted into the fact store, with a pending state, and a lock expiration time. This should be done using atomic operations, because you do not want this to happen twice! This is where you ensure your service's idempotence.
Happy case - happens most of the time
When the Fact store comes back to your service telling it that the fact did not exist, and that a lock was created, the service attempts to do it's work. Once it's done, it deletes the SQS message, and marks the fact as completed.
Duplicate message
So that's what happens when a message comes through and it's not a duplicate. But let's look at when a duplicate message comes in. The service picks it up, and asks the fact store to record it with a lock. The fact store tells it that it already exists, and that it's locked. The service ignores the message and skips over it! Once the message processing is done, by the other worker, it will delete this message from the queue, and we won't see it again.
Disaster case - happens rarely
So what happens when a service records the fact for the first time in the store, then get a lock for a certain period, but falls over? Well SQS will present a message to you again, if it was picked up, but not deleted within a certain period after it was served from the queue. Thats why we code up our fact store such that a service maintains a lock for a limited time. Because if it falls over, we want SQS to present the message to the service, or another instance thereof at a later time, allowing that service to assume that the fact should be incorporated into state (executed) again.
Best Answer
It sounds like you are misinterpreting what you are seeing.
Messages "in flight" are not pending delivery, they're messages that have already been delivered but not further acted on by the consumer.
When a consumer receives a message, it has to -- at some point -- either delete the message, or send a request to increase the timeout for that message; otherwise the message becomes visible again after the timeout expires. If a consumer fails to do one of these things, the message automatically becomes visible again. The visibility timeout is how long the consumer has before one of these things must be done.
Messages should not be "in flight" without something having already received them -- but that "something" can include the console itself, as you'll note on the pop-up you see when you choose "View/Delete Messages" in the console (unless you already checked the "Don't show this again" checkbox):
Messages displayed in the console are "in flight" while the console is observing the queue from the "View/Delete Messages" screen.
The part that does not make obvious sense is messages being in flight "for several minutes" if your default visibility timeout is only 5 seconds and nothing in your code is increasing that timeout... however... that could be explained almost perfectly by your consumers not properly disposing of the message, causing it to timeout and immediately be redelivered, giving the impression that a single instance of the message was remaining in-flight, when in fact, the message is briefly transitioning back to visible, only to be claimed almost immediately by another consumer, taking it back to in-flight again.