Version Control – Managing Database Contents

database-designversion control

I'm working on a web project that involves user-editable content, and I'd like to be able to do version-tracking of the actual content, which lives in a database. Basically, I want to implement wiki-style change histories.

Doing some background research, I see a lot of documentation about how to version your database schema (mine is actually already controlled), but any existing strategies about how to track your database content changes are lost in the avalanche of schema versioning stuff, at least in my searches.

I can think of a few ways to implement my own change tracking, but they all seem rather crude:

  • Save the entire row on each change, relate row back to source id with a Primary key (what I'm leaning towards currently, it's the simplest). Lots of small changes could produce a lot of table bloat, though.
  • save before/after/user/timestamp for each change, with a column name to relate the change back to the relevant column.
  • save before/after/user/timestamp with a table for each column (would result in too many tables).
  • save diffs/user/timestamp for each change with a column (this would mean that you'd have to walk the entire intervening change history to go back to a certain date).

What is the best approach here? Rolling my own seems like I'm probably reinventing someone else's (better) codebase.


Bonus points for PostgreSQL.

Best Answer

The technique I have normally used is to save the complete record, with an end_timestamp field. There is a business rule that only one row can have a null end_timestamp, and this is of course the currently active content.

If you adopt this system, I strongly recommend you add an index or constraint to enforce the rule. This is easy with Oracle, as a unique index can contain one and only one null. Other databases may be more of a problem. Having the database enforce the rule will keep your code honest.

You are quite correct that lots of small changes will create bloat, but you need to trade this off against code and reporting simplicity.

Related Topic