Microservices – Handling Foreign Key Constraints When Migrating from Monolith

Architecturedatabasemicroservicesrefactoring

My team is migrating from a monolithic ASP.NET application to .NET Core and Kubernetes. The code changes seem to be going as well as can be expected but where my team is encountering a lot of discord is around the database.

We currently have rather large SQL Server database that houses all of the data for our entire business. I'm proposing that we split the database in a similar way to splitting the code–catalog data in one (logical) database, inventory data in another, orders in another, etc–and each microservice would be the gatekeeper for its database.

The implication here is that foreign keys that cross microservice boundaries would have to be removed and sprocs and views that reach across boundaries would be prohibited. All of the data models may or may not reside in the same physical database, but even if they do, they should not interact with each other directly. Orders might still reference catalog items by Id but the data integrity would not be strictly enforced at the database level and that data will have to be joined in code rather than in SQL.

I see the loss of these as necessary trade offs in moving to microservice and getting the scalability benefits that come with. As long as we choose our seams wisely and develop around them then it should be OK. Other team members are adamant that everything must stay in the same monolithic database so everything can be ACID and have referential integrity preserved everywhere.

This brings me to my question. First, is my stance on foreign key constraints and join plausible? If so, is anyone aware of any credible reading material I could offer to my colleagues? Their position is nearly religious and they don't seem like they will be swayed by anything short of Martin Fowler himself telling them they're wrong.

Best Answer

There is no clear solution because this depends entirely on your context – in particular, along which dimensions your system is supposed to scale and what your actual problems are. Is the database really your bottleneck?

This (unfortunately rather lengthy) answer will read a bit like “microservices are bad, monoliths for life!”, but that's not my intention. My point is that microservices and distributed databases can solve various problems, but not without having some issues of their own. In order to make a strong argument for your architecture, you must show that these issues do not apply or can be mitigated, and that this architecture is the best choice for your business needs.

Distributed data is difficult.

The same flexibility that enables better scaling is the flip side of weaker guarantees. Notably, distributed systems are much much harder to reason about.

Atomic updates, transactions, consistency/referential integrity, and durability are extremely valuable and should not be waived rashly. There is little point in having data if it is incomplete, out of date, or outright wrong. When you have ACID as a business requirement but are using database technology that cannot offer it out of the box (e.g. many NoSQL databases, or a DB-per-microservice architecture), then your application must fill the gap and provide those guarantees.

  • This is not impossible to do, but tricky to get right. Very tricky. Especially in a distributed setting where there are multiple writers to each database. This difficulty translates to a high chance of bugs, possibly including dropped data, inconsistent data, and so on.

    For example, consider reading the Jepsen analyses of well-known distributed database systems, perhaps starting with the analysis of Cassandra. I don't understand half of that analysis, but the TL;DR is that distributed systems are so difficult that even industry-leading projects sometimes get them wrong, in ways that can seem obvious in hindsight.

  • Distributed systems also imply a larger development effort. To a certain degree, there's a direct trade-off between development costs or dropping money on beefier hardware.

Example: dangling references

In practice, you should not look at computer science but at your business requirements to see whether and how ACID can be relaxed. E.g. many foreign-key relationships might not be as important as they seem. Consider a product – category n:m relationship. In a RDBMS we might use a foreign-key constraint so that only existing products and existing categories can be part of that relationship. What happens if we introduce separate product and category services, and a product or category is deleted?

In this case, that might not be a big problem and we can write our application so that it filters out any products or categories that no longer exist. But there are tradeoffs!

  • Note that this might require an application-level JOIN over multiple databases/microservices, which merely moves processing from the database server to your application. This increases total load and has to move extra data through the network.

  • This can mess with pagination. E.g. you request the next 25 products from a category, and filter out unavailable products from that response. Now your application displays 23 products. In theory, a page with zero products would also be possible!

  • You will want to occasionally run a script that cleans up dangling references, either after each relevant change or on regular intervals. Note that such scripts are fairly expensive because they have to request every product/category from the backing database/microservice to see whether it still exists.

  • This should be obvious, but for clarity: do not reuse IDs. Autoincrement-style IDs may or may not be fine. GUIDs or hashes give you more flexibility, e.g. by being able to assign an ID before the item is inserted into a database.

Example: concurrent orders

Now instead consider a product – order relationship. What happens to an order if a product is deleted or changed? Ok, we can simply copy the relevant product data into the order entry to keep it available – trading disk space for simplicity. But what if the product's price changes or the product becomes unavailable just before an order for that product is made? In a distributed system, effects take time to propagate and the order will likely go through with outdated data.

Again, how to approach this depends on your business requirements. Maybe the outdated order is acceptable, and you can later cancel the order if it cannot be fulfilled.

But maybe that's not an option, e.g. for highly concurrent settings. Consider 3000 people rushing to buy concert tickets within the first 10 seconds, and let's assume a change in availability will require 10ms to propagate. What is the probability of selling the last ticket to multiple people? Depends on how those collisions are handled, but using a Poisson distribution with λ = 3000 / (10s / 10ms) = 3 we get a P(k > 1) = 1 - P(k = 0) - P(k = 1) = 80% chance of collision per 10ms interval. Whether selling and later cancelling the majority of your orders is possible without committing fraud might lead to an interesting conversation with your legal department.

Pragmatism means cherry-picking the best features.

The good news is that you don't have to move to a distributed database model, if that's not required otherwise. No one will revoke your Microservice Club membership if you don't do microservices “properly”, because there is no such club – and no one true way to build microservices.

Pragmatism wins every time, so mix and match various approaches as they solve your problem. This could even mean microservices with a centralized database. Really, don't go through the pain of distributed databases if you don't have to.

You can scale without microservices.

Microservices have two major benefits:

  • The organizational benefit that they can be developed and deployed independently by separate teams (which in turns requires the services to offer a stable interface).
  • The operational benefit that each microservice can be scaled independently.

If independent scaling is not required, microservices are a lot less attractive.

A database server already is a kind of service which you can scale (somewhat) independently, e.g. by adding read replicas. You mention stored procedures. Reducing them might have such a large effect that any other scalability discussions are moot.

And it's perfectly possible to have a scalable monolith which includes all services as libraries. You can then scale by launching more instances of the monolith, which of course requires each instance to be stateless.

This tends to work well until the monolith is too large to be reasonably deployed, or if some services have special resource requirements so that you might want to scale them independently. The problem domains that involve extra resources might not involve a separate data model.

Do you have a strong business case?

You are aware of the business needs of your organization, and can therefore create an argument for a database-per-microservice architecture, based on an analysis:

  • that a certain scale is required, and this architecture is the most cost-effective approach to obtain that scalability, taking into account the increased development effort for such a setup and alternative solutions; and
  • that your business requirements allow relevant ACID guarantees to be relaxed, without leading to various problems like those discussed above.

Conversely, if you are unable to demonstrate this, in particular if the current database design is able to support sufficient scale into the future (as your colleagues seem to believe), then you also have your answer.

There's also a big YAGNI component to scalability. In the face of uncertainty, it is a strategic business decision on building for scalability now (lower total costs, but involves opportunity costs and may not be needed) versus deferring some work on scalability (higher total costs if needed, but you have a better idea of the actual scale). This is not primarily a technical decision.