Microservices – Synchronizing Microservice Replicas

Architecturedomain-driven-designmicroservices

In a microservice architecture, it's commonly admitted that each microservice should have its own replica from another "main" microservice acting as single source of truth. This keeps microservices autonomous and loosely coupled.

When the "main" microservice changes, it emits some events, so that interested microservices update their own replica and keep in sync.
By using a Broker with persistent queues, we can "quite" ensure that no events get lost and replicas keep up to date.

But "quite" is not 100%, and there still are other ways of getting ouf of sync :

  • If a newly developed microservice joins the system, how does it build its replicate (since all events have already been emitted) ? Should we introduce again some kind of (anti-pattern) sync communication to query the whole data from the "main" microservice and build the replica ?
  • When should this process take place ? At the system startup (on a device) ? Periodically on a cloud architecture ?
  • Should we block the whole system until this synchronization is over (we don't want the "main" microservice to emit new events while we are synchronizing, because we get the risk to get out of sync again)

How did you solve these problems on your implementations ?
I've seen somewhere the concept of "reconciliation" but I did not find any implementation of that concept.

Many thanks !

Best Answer

First off, no your microservice shouldn't have a copy of another microservices data. Each microservice should only have its own data and make calls out to other apis if required. Although the design of your system to avoid lots of those calls is key.

However, this need to replay past or missed events to catch up is a common problem in event driven systems.

Solution 1.

Add a Get past events API in addition to the push messages. This allows catch up, checking for missed messages and other scenarios. It's not really an anti pattern unless you are forced to use it so much that you are basically admitting the push messages are not trusted to work.

I think its fairly common to add such an interface and have some sort of check/sync job which audits your overall system on a schedule. say for example you have jobs which haven't moved on in the expected SLA, it might be because a message was missed or errored, you might want to poll for missed messages on these delayed jobs.

Solution 2.

Change from a queue based system to a streaming database like kafka. a streaming database will support replaying events from a point in time, giving you a method for catch-up and spotting missed messages.

Solution 3.

Add a separate replication of existing data process that you can 'manually' apply. This can be useful where you have a specific one off process that needs the full info, say deploying a new tennant. You might need a large amount of base data before subscribing to the push messages, too much for an API, but doable with an export/import flat file on a memory stick

Related Topic