Version Control – Strategy for Version Controlling Database Changes During Development

databasedesign-patternsversion control

I am redesigning a CRM tool, of sorts, from the ground up. We are creating the application and the database from scratch. In order for developers to stay out of each others' way, each developer has a local copy of the application's database rather than sharing a single dev database. This avoids the case of developer 1 making a database change that works for them but breaks the application for everyone else until developer 1 checks in their code changes.

At the suggestion of an answer to a similar such question I saw when we were getting started, we are keeping in source control a folder of sequentially numbered SQL scripts. As a developer makes changes to their database, they script out all of their changes into a SQL file and check it in together with the accompanying application code changes. In theory, that sounds good, but here are some difficulties with it:

  1. If 5.sql is the latest checked in database change script, any (usually all) developers making new database changes create a file named 6.sql because it's a first-come-first-serve type situation. But there can only be one 6.sql checked in. So you either have to check which files everyone is planning to check in before creating your next SQL file or just before you check in, check what is already there and rename your file to the next one (which is still technically subject to a conflict if two developers are checking in at the same time).
  2. When you get latest you have to go run the latest SQL files to bring your database up to date with the latest code you retrieved. This is easy to forget until you run the application and have something blow up. But the bigger problem is it's hard to remember which file you ran last – i.e. which version your database is already on.
  3. No merging or conflict detection. If 2 developers are checking in update scripts at the same time, developer 2's scripts could have conflicting changes with developer 1s new changes. E.G. Say developer 2 writes a script to change the Person.ModifiedDate column to NOT NULL, but just before developer 2 checks in, developer 1 checks in a script that changes the name of Person.ModifiedDate to Person.LastModifiedDate. Developer 2 will either just rename their file so the filename doesn't conflict and unwittingly check in what is now a broken script, or will have to follow a protocol that they cannot check in a sql script until applying all of the latest and validating their script. This is a painful process, especially if developer 2 went and applied developer 1's changes, updated their script(s) and then tried to check in only to find developer 3 has now checked in a script.

So I'm thinking of writing a console app that runs before build to grab the script files and run them against the database in order. It would just need to store the last file it ran somewhere so it knows whether there are new scripts to run the next time. That seems like it will fix #2, but does anyone use a better strategy that addresses all the issues? So far we have avoided conflicts by working on unrelated pieces of the application but as we move forward that will become harder and harder to do and the current strategy requires too much developer discipline – know it will end up causing problems.

Best Answer

These are called database migrations. You're on the right track. Let me address your issues.

  1. or just before you check in, check what is already there and rename your file to the next one (which is still technically subject to a conflict if two developers are checking in at the same time)

The first person to push to your integration branch wins, and all others must rename. It shouldn't be a big deal, and you should make sure it's as simple as renaming a file--you shouldn't have to reconfigure other things too. I don't think you should try to avoid this because the alternative is trying to predict which developer is going to finish first which will inevitably cause bottlenecks.

  1. When you get latest you have to go run the latest SQL files to bring your database up to date with the latest code you retrieved. This is easy to forget until you run the application and have something blow up. But the bigger problem is it's hard to remember which file you ran last - i.e. which version your database is already on.

This is the benefit of having a tool that runs database migrations for you. There are two very popular migration tools out there: Flyway and Liquibase. These tools keep track of which migrations have been run, and you should configure them to run when the app starts up. The tool will scan your directory for any scripts it hasn't seen yet and apply them to the database one by one. It's pretty much what you're proposing to make yourself, but it's already written and mature.

  1. No merging or conflict detection. If 2 developers are checking in update scripts at the same time, developer 2's scripts could have conflicting changes with developer 1s new changes.

The tool will help you here too. Your integration tests should run migrations on a clean database, and if there are two migrations with the same version number, the tool will throw an error. So the first person to push a version number the tests should pass, and if another person has the same version number the tests will start failing.

Related Topic