AWS SQS – FIFO Limit Workaround

awsmessage-queue

According to the AWS feature page for SQS:

FIFO queues support up to 300 messages per second

Standard queues support a nearly unlimited number of transactions per second (TPS) per API action.

I'm trying to build a system that will add notifications to a queue that will then be sent to a customers device using push notifications (SMS, APN, webhooks, email, etc).

There will be a Lambda function that will read items off this queue and actually handle sending the message to the user.

Problem is, I'd like this system to be able to scale as efficiently as possible. Being constrained to 300 notifications per second might cause problems in the future. So I want to design this in a way that is much more scalable than that.

I have thought about building some type of system that will use a standard queue then check to see if that notification has already been sent by having a database that stores the ID of notifications that have been sent. Which might work. But at that point I think I'd be opening the door for race conditions. What happens if for the same notification two Lambda functions got triggered at the exact same time? Neither of them have been sent yet. And the user will send up with 2 notifications instead of one.


How can I design a system that has the best of both worlds? nearly unlimited number of transactions per second while ensuring that no duplicate notifications are sent.

I don't think I mind quite as much if a Lambda function gets triggered twice for 1 notification, so long as it doesn't get sent multiple times to the user. Of course if I can completely prevent this, that'd be awesome too, so that I can reduce cost.


I'd also love to keep using AWS and the more serverless technologies of AWS if possible. I know there is software and ways I could provision EC2 or other types of instances for this. But that takes out the huge advantage of serverless, which is what I'm really aiming for.

Best Answer

Use a normal queue.

Having used these before I would simply accept the very low probablity of a duplicate message. But you can easily protect against them by the addition of a database to your processor.

SNS -> Routing Service -> Message Sending Lambda cloud

Give each message a GUID and have the routing sevice write this to a RMDB table within a transaction and discard messages that have already been sent.

The RMDB will be a bottle neck, but you can safely delete old message IDs from the table to keep it short and it will easily cope with 1000's of transactions a second.

Also. If you are constantly sending messages there isnt really an advantage to using serverless lamdbas, as you will always have several instanciated at any given time.

I would have the Routing Service as standard webapi or windows service on EC2 boxes and have that talk to the DB, so we can reuse the same connection for all requests. Then have a seperate stateless async service to deal with the actual sending.

Here Lambdas might help with burst scaling, although personally I am unconviced. Its really a cost calculation though, run the service both ways and see which is cheaper.

Related Topic