Database – Set Modified Date to Created Date or Null on Record Creation

databasedatabase-design

I've been following the convention of adding created and modified columns to most of my database tables. I also have been leaving the modified column as null on record creation and only setting a value on actual modification.

The other alternative is to set the modified date to be equal to created date on record creation.

I've been doing it the former way but I recent ran into one con which is seriously making me think of switching. I needed to set a database cache dependency to find out if any existing data has been changed or new data added. Instead of being able to do the following:

SELECT MAX(modified) FROM customer

I have to do this:

SELECT GREATEST(MAX(created), MAX(modified)) FROM customer

The negative being that it's a more complicated query and slower. Another thing is in file systems I believe they usually use the second convention of setting modified date = created date on creation.

What are the pros and cons of the different methods? That is, what are the issues to consider?

UPDATE

I believe given the apparent trade-offs I'm going to go with modified = created strategy. In addition, I was curious how other web databases handled this and I noticed drupal seems to follow the convention of modified = created also.

Best Answer

With modified = created if you want the latest modifications with never edited ones included you can rely on the modified column. However if the modified column is initialized with null you have to do a COALESCE(modified, created) which would perform worse.

With modified = created if you want modifications with never edited ones excluded you simply where modified != created, and with modified initialized null you have to do a where modified IS NOT NULL which would have a fairly similar performance, though slightly better and increasing performance with more records having a null modified column.

These are really the only differences, both give the same abilities to filter and aggregate data, you just have to use slightly different techniques for each. I prefer initializing modified = created to avoid the coalesces. Though initializing with nulls may depending on your database system save disk space, especially if edits are uncommon to the point that you'll have mostly nulls. Also if you have mostly nulls, then the performance of the where modified IS NOT NULL will be a good bit better than the where modified != created due to the smaller set meeting the condition.

Edit: Also, if you are interested in data based on modifications frequently enough that you would put an index on this column (pretty uncommon scenario but I don't know your use case), the index where modified = created would have different performance characteristics than initializing it with null, and any coalesces would with significant enough null's lose any benefit from the index because the coalesces will push the query off the index.

Related Topic