DynamoDB Streams with Lambda, how to process related messages in order

amazon-dynamodbamazon-lambdaamazon-web-services

I want to use DynamoDB Streams + AWS Lambda to process chat messages. Messages regarding the same conversation user_idX:user_idY (a room) must be processed in order. Global ordering is not important.

Assuming that I feed DynamoDB in the correct order (room:msg1, room:msg2, etc), how to guarantee that the Stream will feed AWS Lambda sequentially, with guaranteed ordering of the processing of related messages (room) across a single stream?

Example, considering I have 2 shards, how to make sure the logical group goes to the same shard?

I must accomplish this:

Shard 1: 12:12:msg3 12:12:msg2 12:12:msg1 ==> consumer
Shard 2: 13:24:msg2 51:91:msg3 13:24:msg1 51:92:msg2 51:92:msg1 ==> consumer

And not this (messages are respecting the order that I saved in the database, but they are being placed in different shards, thus incorrectly processing different sequences for the same room in parallel):

Shard 1: 13:24:msg2 51:92:msg2 12:12:msg2 51:92:msg2 12:12:msg1 ==> consumer
Shard 2: 51:91:msg3 12:12:msg3 13:24:msg1 51:92:msg1 ==> consumer

This official post mentions this, but I couldn't find anywhere in the docs how to implement it:

The relative ordering of a sequence of changes made to a single
primary key will be preserved within a shard. Further, a given key
will be present in at most one of a set of sibling shards that are
active at a given point in time. As a result, your code can simply
process the stream records within a shard in order to accurately track
changes to an item.

Questions

1) How to set a partition key in DynamoDB Streams?

2) How to create Stream shards that guarantee partition key consistent delivery?

3) Is this really possible after all? Since the official article mentions: a given key will be present in at most one of a set of sibling shards that are active at a given point in time so it seems that msg1 may go to shard 1 and then msg2 to shard 2, as my example above?

4) In this question, I found this:

The amount of shards that your stream has, is based on the amount of
partitions the table has. So if you have a DDB table with 4
partitions, then your stream will have 4 shards. Each shard
corresponds to a specific partition, so given that all items with the
same partition key should be present in the same partition, it also
means that those items will be present in the same shard.

Does this mean that I can achieve what I need automatically? "All items with the same partition will be present in the same shard". Does Lambda respect this?

5) From the FAQ:

The ordering of records across different shards is not guaranteed, and
processing of each shard happens in parallel.

I don't care about global ordering, just logical one as per example. Still, not clear if the shards group logically with this answer from the FAQ.

Best Answer

Does this answer help?

https://stackoverflow.com/questions/44266633/how-do-dynamodb-streams-distribute-records-to-shards

The ordering of records across different shards is not guaranteed, and processing of each shard happens in parallel.

Related Topic