Distributed transactions with Kafka

apache-kafkamessage-queuesoatransaction

I need to implement a transaction that spans over loosely coupled (SOA+MOM) components.

Distributed transaction sketch

When a particular event is received, FOO and BAR need to do a transactional operation: that is, both the transactions on FOO-DB and BAR-DB need to be successful or none of them. The consistency here is very important.

When the event is received (1), FOO does an operation on a database (2).
If the operation fails, nothing more happen.
If the operation is successful, a message is sent to BAR through a MOM (Kafka) (3).
BAR does an operation on another database (4).
If the operation fails, the operation previously done on FOO-DB must be reverted (5, 6).
If the operation is successful, all is fine.

Right now I'm using Kafka. I like its simplicity and speed, but I'm open to consider other solutions if they would make this situation easier to implement/maintain/extend.

I'm quite new to SOA and MOM architectures and patterns, so I'm wondering:

  • is this a common scenario/pattern?
  • how is this commonly implemented?
  • the simple means offered by Kafka are enough to reliably implement this or I would be better considering other solutions?
  • is the distributed transaction manager usually provided by the MOM or by the database? and if it is the DB, how can this be done using different DBs?

Sorry for the many questions, I hope they're not too much for a single question.
Thank you!

Best Answer

I'm posting this comment as an answer because it's too long for a comment, but it does not give a workable solution to the OPs problem, only explains why what he wants to do is AFAIK impossible to implement 100% correctly.

You mention FOO-DB twice in your comment but not BAR-DB. Did you mean BAR-DB in your second instance? I'm going to assume you did.

Anyway, here's why what you wrote won't work: you update FOO-DB but with a "not validated" tag, it succeeds, you go to update BAR-DB, and once you confirm that update is successful, you remove the "not validated" tag from FOO-DB (which is just another update). But in that case, between the time you update BAR-DB and mark FOO-DB as validated, your combined databases are in an inconsistent state.

In essence, you only delayed the problem by one round. This is similar to the two generals problem, but not the exact same. Even though we can assume that the communications between these two nodes is guaranteed (because you're using Kafka), there is always an unspecified and unknown time delay, making an Atomic and Consistent update to the combined databases impossible.

I would like to point out that I am no databases expert. But for example, this was a problem where I worked with using the MongoDB cluster we had setup. Because we had multiple nodes for each database (replication), if you updated a record in the database and then read the record soon afterwords, there was no guarantee you would read the updated record. The writes took some (usually small but) unknown amount of time to propagate to all the replicas.

You actually haven't specified which Databases you're using, so I don't know the exact guarantees they provide, but since they are two entirely different clusters from what it sounds like, so no matter what the MOM you use to communicate between them is, they cannot perfectly coordinate.

The only solution I can think of, is to lock both databases, perform your updates, and then unlock both databases. Obviously this is not viable, but it is a thought experiment to show what kind of extreme measures it would take to accomplish this.

Related Topic