Multi-Tenant Applications – Handling Common Data

database-designmultitenancy

I am designing a multi-tenant application which will use the same database schema for a separate DB instance per-tenant (i.e. every time I get a new customer, I create a new database for them). This approach is desirable for its simplicity in that I won't need to filter data per-tenant -> I can just select every User in dbo.Users, for example – no need to filter by CustomerId / no danger that I forget to and accidentally expose the wrong data to the wrong customer.

What I am wondering is how to handle data which is common to all databases. Postcodes (Zipcodes) would be a great example. It looks like I either need to replicate this data in each database (which will be a nightmare to maintain and will mean I am storing loads of duplicated data) OR I need to use some common database which then prevents me from joining tables in the natural way, or using e.g. EntityFramework out of the box. Neither of these sound right/better than the other.

Does anyone have a good strategy for this?

Best Answer

I need to use some common database which then prevents me from joining tables in the natural way, or using e.g. EntityFramework out of the box.

You can use EntityFramework.DynamicFilters for this. It allows you to put a dynamic filter on your model that will be applied to all queries (both direct queries and loading related entities).

For example, I use DynamicFilters to filter soft-deleted items. Items with DeletedOn != null will be hidden from sight.

modelBuilder.Filter("IsDeleted", (BaseEntity d) => d.DeletedOn, null);

You could use a similar approach, something like:

modelBuilder.Filter("CustomerId", (User u) => u.CustomerId, GetCurrentCustomerId());

This mostly disables your further reasoning to use multi-tenancy purely to separate customer data.


However, there are other considerations that can lead you to want to use a multi-tenant platform.

I am designing a multi-tenant application which will use the same database schema for a separate DB instance per-tenant (i.e. every time I get a new customer, I create a new database for them).

  • What happens when you have customers, and you then wish to upgrade your database schema. Are you enforcing upgrades across the board, or are you going to allow customers to upgrade when/if they want to?

  • What happens when one customer requires a restore of their data? Do you want to be able to only restore that customer's data? (I assume so - I just wanted to point this out).

Both cases can be strong point for using multi-tenancy: customers can upgrade at their own pace, and can receive data restores without affecting other customers.

However, then we run into another issue: common data between different application versions. If the common data changes during an upgrade, you're going to run into trouble.

I know your examples of ZIP codes is less applicable here, as they're not prone to being changed between versions; but the general point still stands: some common data may indeed change between versions.

There are two solutions here, I will discuss both briefly.


1. Hosting the common data

You can keep a centralized database of common data, but I suggest hiding this behind a service. This gives you the possibility of easily returning versioned common data. Every tenant will tell you what their current version is (e.g. v1.3) and your service then ensures that it returns the common data for version 1.3.

This gives you the separation you need, but there are some issues here: it costs overhead to create the web service, and it's effectively an external dependency that you're always going to have to rely on.

I prefer this approach for common data which is not version-specific and instead considered "globally correct" (such as a list of ZIP codes).
However, there needs to be a reasonable data size to warrant putting it into a centralized repository. If it's 5 fields, the overhead of creating the service far outweighs the data footprint of copying those 5 lines in all tenants.


2. Loading the common data into the tenant

I prefer this approach for common data which is version-specific, unless the data is so large that it becomes a problem to include it in every tenant separately.

In short, you can achieve this using (for example) database seeding in EF, which allows you to update the common data at the same time you upgrade the database schema to a newer version.

There are many ways to achieve this. I like database seeding as it ties nicely into the schema upgrade process.


I understand why you want to centralize data - it's shared data right? But there is a line of reasonability here, not every abstraction is necessary.

As an oversimplified example, consider the idea that when your entities all have audit fields (CreatedOn, ModifiedOn, ...), you tend to abstract this in a IAuditable or AuditableEntity. That is good practice
However, when you have three entities (Person, Country and StuffedAnimal) which all have a Name property, that doesn't mean that you should abstract this into a INamed or NamedEntity. This no longer a reasonable argument.

The same is happening here. When you apply the theory to the letter, then shared data should be abstracted into a centralized point. However, by that same logic, you also shouldn't be using multi-tenancy then because they have a shared database schema, right?

You shouldn't apply the theory to the letter here, and instead consider the practical application. A tenant is created specifically to run independently from the other tenants. There are several benefits to doing so, but "pure abstraction" isn't one of them. If anything, multi-tenancy is refusing to abstract or share resources specifically so you can prevent issues from becoming a global issue for all yoru customers.

If you need updates to your common data to be propagated to all tenants at all times, then option 1 is better.

If you want to ability to version your common data for your tenants, then you are better off keeping the data locally inside the tenants themselves.