Microservices – Seeding Data into a New Microservice

asp.net-coremicroservices

I've been reading though Architecture for containerized .NET applications book, and I have a question about replicating data from from service into another:

And finally (and this is where most of the issues arise when building
microservices), if your initial microservice needs data that’s
originally owned by other microservices, do not rely on making
synchronous requests for that data. Instead, replicate or propagate
that data (only the attributes you need) into the initial service’s
database by using eventual consistency (typically by using integration
events, as explained in upcoming sections).

If we deploy two services at the same time, I understand how we can replicate a subset of the data from one service into another using events, but what happens if the second service (that needs to replicate data) is deployed at a later stage when the first service already has data in its database? How can the new service "catch up" and get in sync?

Is it reasonable for the new service to use a synchronous call the first time it starts, and seed its own database? Should there be some sort of script that runs and reads from database A and writes to database B?

Best Answer

From your quote

do not rely on making synchronous requests for that data. Instead, replicate or propagate that data (only the attributes you need) into the initial service’s database by using eventual consistency

So no, it does not sounds reasonable to do a synchronous big-bang import from one data store into another. At least not if we consider what we are learning from the book. The release of the new service should not depend on data it has no accountability. If the data is already in the data store, then good. If not, it should be confident that at some point it will have it. The whole point of MS is they can appear or disappear at any time within the system's lifespan.

Ideally, the new service gathers data as it arrives from the existing system and saves only the necessary one (besides the one generated by the service itself) even if this data already exists in someone else's database. That's the " data duplication ". Each service might hold data (sometimes only chunks of it) that already exists in other domains (e.g the customer name can be in many databases each of which is managed by a different service).

New services should not be born upon the premise that concrete services already exist. That will couple and drag'em forever at different levels, not only at the technical level.

The autonomy doesn't only involves IPCs (inter-process communication protocols), it also involves the ALM (application lifecycle management). Relying on synchronous calls might lead you to ask another team for a new feature they don't have. Imagine deploying tens or hundreds of services a year. It can impacts everyone's roadmap, making project management unbearable. Leave alone product management.

However, it's reasonable for new services to need (historical) data. This can be achieved in many ways. One could be broadcasting a sort of newServiceRegistered event so existing services can react to this one and push data. How to move this data without direct communication between services is another problem to solve. I guess that's what the book will introduce in sections explaining the integration events.

That's all the theory and it might or might not fit your current needs and expectations.

If you can provide the new service with the required data without involving coupling among services, say by scripts executed once during the release, ETLS, etc, then it's fine. But bear in mind that, as new services appear, more of those scripts are needed and how to generate the scripts for each data store and how to execute'em from environment to environment can be a lot of work and could end up impacting several teams. Releasing a new service is not just deploying something to the wild, there's a whole ceremony around that event. It's not rare that MS dev teams build their own tools to feed and operate the service. If the infrastructure doesn't provide any mechanism, then they have to create it.

The goal is always the same: _to be as autonomous and independent from other services/teams/management as possible.