Database Design – What Happened to Database Constraints?

database-designrelational-databaserules-and-constraints

When I review database models for RDBMS, I'm usually surprised to find little to no constraints (aside PK/FK). For instance, percentage is often stored in a column of type int (while tinyint would be more appropriate) and there is no CHECK constraint to restrict the value to 0..100 range. Similarly on SE.SE, answers suggesting check constraints often receive comments suggesting that the database is the wrong place for constraints.

When I ask about the decision not to implement constraints, team members respond:

  • Either that they don't even know that such features exist in their favorite database. It is understandable from programmers using ORMs only, but much less from DBAs who claim to have 5+ years experience with a given RDBMS.

  • Or that they enforce such constraints at application level, and duplicating those rules in the database is not a good idea, violating SSOT.

More recently, I see more and more projects where even foreign keys aren't used. Similarly, I've seen a few comments here on SE.SE which show that the users don't care much about referential integrity, letting the application handle it.

When asking teams about the choice not to use FKs, they tell that:

  • It's PITA, for instance when one has to remove an element which is referenced in other tables.

  • NoSQL rocks, and there are no foreign keys there. Therefore, we don't need them in RDBMS.

  • It's not a big deal in terms of performance (the context is usually small intranet web applications working on small data sets, so indeed, even indexes wouldn't matter too much; nobody would care if a performance of a given query passes from 1.5 s. to 20 ms.)

When I look at the application itself, I systematically notice two patterns:

  • The application properly sanitizes data and checks it before sending it to the database. For instance, there is no way to store a value 102 as a percentage through the application.

  • The application assumes that all the data which comes from the database is perfectly valid. That is, if 102 comes as a percentage, either something, somewhere will crash, or it will simply be displayed as is to the user, leading to weird situations.

  • While more than 99% of the queries are done by a single application, over time, scripts start to appear—either scripts ran by hand when needed, or cron jobs. Some data operations are also performed by hand on the database itself. Both scripts and manual SQL queries have a high risk of introducing invalid values.

And here comes my question:

What are the reasons to model relational databases without check constraints and eventually even without foreign keys?


For what it's worth, this question and the answers I received (especially the interesting discussion with Thomas Kilian) led me to write an article with my conclusions on the subject of database constraints.

Best Answer

It is important to distinguish between different use cases for databases.

The traditional business database is accessed by multiple independent applications and services and perhaps directly by authorized users. It is critical to have a well-thought out schema and constraints at the database level, so a bug or oversight in a single application does not corrupt the database. The database is business-critical which means inconsistent or corrupt data may have disastrous results for the business. The data will live forever while applications come and go. These are the places which may have a dedicated DBA to ensure the consistency and health of the database.

But there are also systems where the database is tightly integrated with a single application. Stand-alone applications or web application with a single embedded database. As long as the database is exclusively accessed by a single application, you could consider constraints redundant - as long as the application works correctly. These systems are often developed by programmers with a focus on application code and perhaps not a deep understanding of the relational model. If the application uses an ORM the constraints might be declared at the ORM level in a form more familiar to application programmers. In the low end we have PHP applications using MySQL, and for a long time MySQL did not support basic constraints at all, so you had to rely on the application layer to ensure consistency.

When developers from these different backgrounds meet you get a culture clash.

Into this mix we get the new wave of distributed "cloud storage" databases. It is very hard to keep a distributed database consistent without losing the performance benefit, so these databases often eschew consistency checks at the database level and basically lets the programmers handle it at the application level. Different application have different consistency requirements, and while Googles search engine prioritize availability over consistency across their servers, I'm willing to bet their payroll system runs on a relational database with lots of constraints.

Related Topic