Database – Two Database Architecture : Operational and Historical

databasedesign

I thought about an uncommon database structure and wonder if anyone has seen it in use before. It's basically using 2 databases:

  • First database hold only the data that is currently valid
  • Second database hold the history of everything that has ever been entered, updated or
    deleted in the first database

Scenario

I'm working on a project where I'm required to log everything that happens and where the data changes frequently.

Example (not the real one)

You have to do the database design for a soccer league. In this league there are players and teams. The players often switch teams.

  • First requirement : The database must hold the information necessary to play the next match. This means a list of every players, teams and in which team each player is currently.
  • Second requirement : The database must hold historical values that we will use to generate statistics. This means the list of all the players that have been part of a team or the list of all the teams a player has been a part of.

The problem

These two requirements are kinda the opposite of each other. I have tried to do everything in the same database but it doesn't make sense. The first requirement only cares about "playing the next match" while the second requirement only cares about "generating statistics".

To do everything in the same database, I went with a sort of "insert only" database using the obvious soft delete to delete/update information…

What initially seemed like an easy task, holding a list of players, teams and the current team of each players, suddenly becomes a lot harder. The application logic required to play the next match is already complicated enough but now the database has a very unhelpful design where the application is required to add "is deleted" check on every single query just to play the next match.

Would you want to be that coach that yell "all players in the team, come to me" and then 2000 players come at you. At which point, you will probably yell "all players that are not deleted in the team, come to me" (while swearing about this stupid design).

My conclusion

I came to wonder why you need to put everything in the same database. Not only does the soft delete do a poor job at logging everything unless you add many columns (time_created, who_created_it, time_deleted, who_deleted_it) but it also complicate everything. It complicates the database design and it complicates the application design.

Also, I receive these 2 requirements as part of one single application that cannot be split but I keep thinking : this is 2 completely distinct applications. Why am I trying to do everything together.

That's when I thought about splitting the database in two. An operational database that is used only to play the next match and only contain the information that is currently valid and an historical database that hold all the information that ever existed, when it was created, deleted and who did it.

The goal is to keep the first database (operational) and the application as simple as possible while having as much information as possible in the second database (historical).

Questions

  • Have you seen that design before? Does it have name?
  • Are there any obvious pitfalls that I am missing?

EDIT 2015-03-16

Current architecture

You can basically think about the whole architecture as a 2 steps process.

Step 1 :

  • Application is running and users are doing some actions
  • Each time an event happen, it is recorded automatically (audit solution) in an event table
  • Then the correct row, in the operational database is updated

Step 2 :

  • A job read the latest insertion in the event table and insert this new data in the
    historical database.
  • Users query the historical database to retrieve the information that they need.

Just from the event table, you can reconstruct the information to any point in time. The problem is that this event table is not easily queryable. This is where the historical database kicks in; to present the data into a way that it is easy to retrieve exactly what we want.

Additional problems when putting everything in the same tables

I have already expressed my concern about the added complexity of checking "is deleted" on each query. But there is another issue : integrity.

I make heavy use of foreign key and constraint to make sure that at any point in time, the data that is in my database is valid.

Let's look at an example :

Constraint : There can only be one goal keeper per team.

It's easy to add a unique index that check if there is only one goal keeper per team. But then what happen when you change the goal keeper. You still need to preserve the information about the previous one but now you have 2 goal keepers in the same teams, one active and one inactive, which contradicts your constraint.

Sure it's easy to add a check to your constraint, but it's another thing to manage and think about.

Best Answer

It happens quite often, though the history (sometimes known as audit records) is kept either in the same table, or in a separate one in the same database.

For example, I used to work with a system where any updates to a table would be implemented as an insert, the old 'current' record would have a flag set on it saying it was a historical record, and the timestamp when it was updated written to a column.

Today I work on a system where every change is written to a dedicated audit table and then the update occurs on the table.

The latter is more scalable, but not as easy to implement in a generic manner.

The easiest way to achieve your goal of making queries simple and not requiring adding the 'is current' flag is to only allow read queries via a view or stored procedure. Then you make a call to say "get all players" and the stored proc will return only current players (you can implement a 2nd procedure to return players with more control over which ones are returned). This works well for writing too. A stored procedure for updating a player can then write whatever history details are needed and update the player - without the client ever knowing what the history mechanism is. For this reason, stored procedures are better than a view which returns only current players, as it keeps the entire DB access mechanism the same for reading and writing - everything goes through a sproc.

Related Topic