Synchronization of data across microservices

domain-driven-designmessage-queuemicroservicessynchronization

We have 2 to 3 dozen microservices that serve our customers.
These services are deployed in a Kubernetes cluster, and they're only accessible to the outside world through 3 or 4 API gateways.

We found that sometimes the same data is needed by two or more microservices, so we evaluated a couple of strategies to solve this problem and have implemented a solution in pieces.
Like any design, we are not 100% sure if we're using the right approach, and whether we're missing potential pitfalls in the design.

Case 1:
When a service of lesser business importance ServiceL needs data from a service of higher business importance ServiceH, then
ServiceL calls ServiceH directly to get the necessary data.

Case 2:
When a service of lesser business importance ServiceL needs data from many important services ServiceH1, ServiceH2, etc), then
ServiceH1, ServiceH2 publish messages with that data to RabbitMQ.
The publishing of messages is fire-and-forget, so the services are not blocked.
ServiceL subscribes to these messages and stores the data in its own data store.
We are okay with the delay in the data becoming available to ServiceL.

Case 3:
When a service of higher business importance ServiceH needs data from a less-important service ServiceL, then ServiceL publishes a message with that data to RabbitMQ via a fire-and-forget or blocking mechanism, depending on urgency of syncing the data.
ServiceH consumes the message and stores it in its data store.
Often the data is needed by ServiceH for reports and summary, and we are okay with the summary not being perfectly up to date at all times (eventually consistent) .

Case 4:
When data is needed by two services and both of them not only read data but also modify it, then we believe the domain identification is wrong in which case we redesign them, often merging these two microservices into one.

Additional Info for Case 2 & 3:
Now when we use a messaging framework like RabbitMQ for syncing data across services, over a period of time we observed data is getting out of sync between services.
When data gets out of sync, we could see the statistics from RabbitMQ and replay messages, but we believe this brings in unnecessary complexity.
We've ended up running jobs once a day to sync the data from the source service to the destination service, where the data is retrieved from the applicable services and not directly from their data stores.

Is this a good approach to sync data between microservices? Are there any pitfalls?

Best Answer

I would take a look at one of Microsoft's newer projects code named "Ambrosia" (link will take you to their Github page where the project is being developed open source) which focuses on providing a solution to this exact problem and several other major data consistency problems when developing distributed services.

The cliff-notes version is that they provide Virtual Resiliency which they desribe as the holding the following meaning:

Virtual Resiliency is a mechanism in a (possibly distributed) programming and execution environment, typically employing a log, which exploits the replayably deterministic nature and serializability of an application to automatically mask failure.

With one of the key benefits of utilizing the Ambrosia project being, that you are then provided with a layer of abstraction over the top of all the Transient Fault handling and Data Consistency problems, which are encountered thanks to the transport layer's reliability! This means that your developers DO NOT have to write any fault handling or data consistency into your code base as the underlying Ambrosia framework manages all of those cross cutting issues, as well as handling reconnecting any disconnected connections (tunnels, ssh, etc).

All of the information below, is taken straight from the project's Github page, and you can thus find this information and much, much more detailed sample use cases, etc. by following the link in the first paragraph above! I hope that this helps you guys out! It has been working great for the projects I currently am running in cloud native context!


How it works

The figure below outlines the basic architecture of an AMBROSIA application, showing two communicating AMBROSIA services, called Immortals. Each inner box in the figure represents a separate process running as part of the Immortal. Each instance of an Immortal exists as a software object and thread of control running inside of an application process. An Immortal instance communicates with other Immortal instances through an Immortal Coordinator process, which durably logs the instance's RPCs and encapsulates the low-level networking required to send RPCs. The position of requests in the log determines the order in which they are submitted to the application process for execution and then re-execution upon recovery. The Ambrosia System Architecture

Ambrosia Architecture

In addition, the language specific AMBROSIA binding provides a state serializer. To avoid replaying from the start of the service during recovery, the Immortal Coordinator occasionally checkpoints the state of the Immortal, which includes the application state. The way this serialization is provided can vary from language to language, or even amongst bindings for the same language.

Related Topic