I need to use some common database which then prevents me from joining tables in the natural way, or using e.g. EntityFramework out of the box.
You can use EntityFramework.DynamicFilters for this. It allows you to put a dynamic filter on your model that will be applied to all queries (both direct queries and loading related entities).
For example, I use DynamicFilters to filter soft-deleted items. Items with DeletedOn != null
will be hidden from sight.
modelBuilder.Filter("IsDeleted", (BaseEntity d) => d.DeletedOn, null);
You could use a similar approach, something like:
modelBuilder.Filter("CustomerId", (User u) => u.CustomerId, GetCurrentCustomerId());
This mostly disables your further reasoning to use multi-tenancy purely to separate customer data.
However, there are other considerations that can lead you to want to use a multi-tenant platform.
I am designing a multi-tenant application which will use the same database schema for a separate DB instance per-tenant (i.e. every time I get a new customer, I create a new database for them).
What happens when you have customers, and you then wish to upgrade your database schema. Are you enforcing upgrades across the board, or are you going to allow customers to upgrade when/if they want to?
What happens when one customer requires a restore of their data? Do you want to be able to only restore that customer's data? (I assume so - I just wanted to point this out).
Both cases can be strong point for using multi-tenancy: customers can upgrade at their own pace, and can receive data restores without affecting other customers.
However, then we run into another issue: common data between different application versions. If the common data changes during an upgrade, you're going to run into trouble.
I know your examples of ZIP codes is less applicable here, as they're not prone to being changed between versions; but the general point still stands: some common data may indeed change between versions.
There are two solutions here, I will discuss both briefly.
1. Hosting the common data
You can keep a centralized database of common data, but I suggest hiding this behind a service. This gives you the possibility of easily returning versioned common data. Every tenant will tell you what their current version is (e.g. v1.3) and your service then ensures that it returns the common data for version 1.3.
This gives you the separation you need, but there are some issues here: it costs overhead to create the web service, and it's effectively an external dependency that you're always going to have to rely on.
I prefer this approach for common data which is not version-specific and instead considered "globally correct" (such as a list of ZIP codes).
However, there needs to be a reasonable data size to warrant putting it into a centralized repository. If it's 5 fields, the overhead of creating the service far outweighs the data footprint of copying those 5 lines in all tenants.
2. Loading the common data into the tenant
I prefer this approach for common data which is version-specific, unless the data is so large that it becomes a problem to include it in every tenant separately.
In short, you can achieve this using (for example) database seeding in EF, which allows you to update the common data at the same time you upgrade the database schema to a newer version.
There are many ways to achieve this. I like database seeding as it ties nicely into the schema upgrade process.
I understand why you want to centralize data - it's shared data right? But there is a line of reasonability here, not every abstraction is necessary.
As an oversimplified example, consider the idea that when your entities all have audit fields (CreatedOn
, ModifiedOn
, ...), you tend to abstract this in a IAuditable
or AuditableEntity
. That is good practice
However, when you have three entities (Person
, Country
and StuffedAnimal
) which all have a Name
property, that doesn't mean that you should abstract this into a INamed
or NamedEntity
. This no longer a reasonable argument.
The same is happening here. When you apply the theory to the letter, then shared data should be abstracted into a centralized point. However, by that same logic, you also shouldn't be using multi-tenancy then because they have a shared database schema, right?
You shouldn't apply the theory to the letter here, and instead consider the practical application. A tenant is created specifically to run independently from the other tenants. There are several benefits to doing so, but "pure abstraction" isn't one of them. If anything, multi-tenancy is refusing to abstract or share resources specifically so you can prevent issues from becoming a global issue for all yoru customers.
If you need updates to your common data to be propagated to all tenants at all times, then option 1 is better.
If you want to ability to version your common data for your tenants, then you are better off keeping the data locally inside the tenants themselves.
Bottom Line Up Front: You will likely have to start with a compromise.
Micro-services, and multi-tenancy are hard. You have to consider the trade-offs on cost to run, maintain, and build your solutions. The answers are going to conflict with what makes the system more robust and secure. The challenge is to figure out where your project needs to start, and what compromises you have to accept for the moment.
There are a couple axioms to keep in mind:
- Complexity and cost are directly related. The more complex something is, the more expensive it will be to build it and maintain it.
- Isolated systems are generally safer, but also are more complex. When two tenants data never touch, they can't affect the other.
- We are not all FaceBook. Meaning that most companies have to worry about cost more than isolation and the required complexity that comes with it.
When you start breaking down the different topics, you are going to find that what is more correct for one answer is less correct for another. For example, your first topic and your second topic have different answers.
Maintainability
One thing is easier to maintain than several things. That goes even more for your database.
Having one large shared database cluster is going to be easier to manage the following:
- Backup/Restore
- Load balancing a cluster
At least they will up to a point. The problem you may get to is that one of your application's tenants has vastly more demands than another. If your database is a shared resource between the tenants, you will eventually run into the situation where your super users are impacting your service to the other tenants. That may not be something you have to worry about on day one.
Impact of Disasters
If your database goes down, you will need to restore the database server then restore the latest backup.
- All tenants served by the database server that went down are affected.
- One database for all tenants means all your customers are affected
- Separate databases for each tenant means only that tenant is affected
- Some databases are designed to scale out
- Sharding spreads the data across multiple nodes in a cluster
- Replication adds redundancy to your data spread across those nodes
- These are designed to allow a single node to be lost, and replaced without any loss of data or service
It's worth looking in to databases that are designed to scale out. Examples would be Apache Cassandra, Mongo DB, Raven DB, etc. Most NoSQL databases are designed around this concept. The upshot is that you have one "logical" database, but multiple processing nodes allow you to expand capacity as you need. It might be a worthwhile compromise to simplify your data design while having the robustness and safety you need.
Feasibility of multi-tenant database approach
That's something you'll have to evaluate. The approaches you are weighing against each other are:
- One database for everything
- One database per tenant
- One database per micro-service
- One database per micro-service per tenant (the utmost in isolation)
To perform a useful analysis of alternatives, you need to define:
- Key performance areas/Requirements -- know what is important for your app
- Cost of the solution
- T-shirt size estimates of what it would take to implement each approach
Create the chart, see how each approach hits those check marks, and then make a decision. Remember the axiom about complexity and cost being directly related? The decision you have to make right now may not be what the pundits say is the most correct thing. You have to live within budget constraints. As your application brings in more revenue, your budget will increase, which will allow you to update your system in ways you can't consider right now.
Security
Security is a complicated topic, that has so many facets that again you have to make decisions based on the real legal requirements you have in your country, or that your clients demand. Below are a just a few security related concepts:
- Non-repudiation (i.e. a user cannot deny the actions they performed)
- Auditing (i.e. you can reconstruct the actions a user performed to find bad actors)
- Data protection (i.e. a user cannot see information they are not allowed to see)
- Infrastructure security (i.e. network access, file access, etc. are properly protected)
- Data encryption (i.e. a user cannot discover someone else's data by sniffing network packets)
There is even more than that. Many security aspects will be constant across your alternatives (like encryption, infrastructure security, etc.) However, the answer to the concept of data protection is more secure if your database does not have data from multiple tenants inside of it. That may not matter if the user can't access the database directly.
When dealing with security concerns, it's best to understand what you are actually required to handle:
- Are there legal requirements you need to comply with? (UK and several other countries have very strict user privacy laws, while other countries do not)
- Are there standards your clients demand?
- Are there simple and low cost things you can do to improve security?
Even when you consider user privacy laws, the security demands of a bank or health care system are going to be much greater than those needed for a social networking app.
Summary
Your team (manager included) need to define the following:
- Requirements -- what your multi-tenant application really needs, also the security requirements
- Constraints -- budget, schedule, tools (some shops will define tools that cannot be used, and others may define tools that must be used)
- Key Performance Areas -- includes performance criteria, management support, etc.
Without those, you won't be able to settle on something that fits the unique demands of your application. The most correct thing is going to a bit different for each application because the unique requirements and constraints you have to work with influence what that actually is.
Best Answer
Don't put all the authentication data in the shared database.
Instead of storing tenant identity and authentication details in a separate database, you can store identity (username) and redirection details (server/instance.databasename). Then you can handle authentication at the tenant's database.
I'm assuming admins are going to send invitations to their users. All of the account information is predominantly handled in their own database. Just use the identification and redirection database to help in managing duplicate usernames. When a new account is created, you just need to create a record here with the database identify information (This should be part of the create user transaction.). There isn't anything highly sensitive in this shared part of your data. You could probably redirect a tenant to a specific application server as well. Internally, if you move a database (Maybe a huge client gets their own server.), just change the redirect records.
Salesforce.com doesn't allow duplicate user names across their entire system, so you should be able to do it as well.