Database Design – What Happened to Database Constraints?

database-designrelational-databaserules-and-constraints

When I review database models for RDBMS, I'm usually surprised to find little to no constraints (aside PK/FK). For instance, percentage is often stored in a column of type int (while tinyint would be more appropriate) and there is no CHECK constraint to restrict the value to 0..100 range. Similarly on SE.SE, answers suggesting check constraints often receive comments suggesting that the database is the wrong place for constraints.

When I ask about the decision not to implement constraints, team members respond:

Either that they don't even know that such features exist in their favorite database. It is understandable from programmers using ORMs only, but much less from DBAs who claim to have 5+ years experience with a given RDBMS.
Or that they enforce such constraints at application level, and duplicating those rules in the database is not a good idea, violating SSOT.

More recently, I see more and more projects where even foreign keys aren't used. Similarly, I've seen a few comments here on SE.SE which show that the users don't care much about referential integrity, letting the application handle it.

When asking teams about the choice not to use FKs, they tell that:

It's PITA, for instance when one has to remove an element which is referenced in other tables.
NoSQL rocks, and there are no foreign keys there. Therefore, we don't need them in RDBMS.
It's not a big deal in terms of performance (the context is usually small intranet web applications working on small data sets, so indeed, even indexes wouldn't matter too much; nobody would care if a performance of a given query passes from 1.5 s. to 20 ms.)

When I look at the application itself, I systematically notice two patterns:

The application properly sanitizes data and checks it before sending it to the database. For instance, there is no way to store a value 102 as a percentage through the application.
The application assumes that all the data which comes from the database is perfectly valid. That is, if 102 comes as a percentage, either something, somewhere will crash, or it will simply be displayed as is to the user, leading to weird situations.
While more than 99% of the queries are done by a single application, over time, scripts start to appear—either scripts ran by hand when needed, or cron jobs. Some data operations are also performed by hand on the database itself. Both scripts and manual SQL queries have a high risk of introducing invalid values.

And here comes my question:

What are the reasons to model relational databases without check constraints and eventually even without foreign keys?

For what it's worth, this question and the answers I received (especially the interesting discussion with Thomas Kilian) led me to write an article with my conclusions on the subject of database constraints.

Best Answer

It is important to distinguish between different use cases for databases.

The traditional business database is accessed by multiple independent applications and services and perhaps directly by authorized users. It is critical to have a well-thought out schema and constraints at the database level, so a bug or oversight in a single application does not corrupt the database. The database is business-critical which means inconsistent or corrupt data may have disastrous results for the business. The data will live forever while applications come and go. These are the places which may have a dedicated DBA to ensure the consistency and health of the database.

But there are also systems where the database is tightly integrated with a single application. Stand-alone applications or web application with a single embedded database. As long as the database is exclusively accessed by a single application, you could consider constraints redundant - as long as the application works correctly. These systems are often developed by programmers with a focus on application code and perhaps not a deep understanding of the relational model. If the application uses an ORM the constraints might be declared at the ORM level in a form more familiar to application programmers. In the low end we have PHP applications using MySQL, and for a long time MySQL did not support basic constraints at all, so you had to rely on the application layer to ensure consistency.

When developers from these different backgrounds meet you get a culture clash.

Into this mix we get the new wave of distributed "cloud storage" databases. It is very hard to keep a distributed database consistent without losing the performance benefit, so these databases often eschew consistency checks at the database level and basically lets the programmers handle it at the application level. Different application have different consistency requirements, and while Googles search engine prioritize availability over consistency across their servers, I'm willing to bet their payroll system runs on a relational database with lots of constraints.

Related Solutions

Database Design – Designing a Database with Circular References for One-of-Many Case

For a simple project I would probably adopt the second solution. On the other hand, first solution may be slightly modified to avoid your mentioned drawbacks. You could model a profile image as a special case of an image, i.e. inheritance in OOP or is-a hierarchy in relational DB.

Therefore, leaving out the profile_image column and introducing table

Profile image:
   image_id - foreign key to Image
   user_id - foreign key to User

with unique constraint on user_id and possibly dedicated primary key column + indexes on columns used for the desired use case.

Only problem may be the sorting, where you'd need keep some weights (a column in Image table) in consistence with Profile image table.

Database Design with Performance Issues and Use of Views

Is it really a big performance issue to make multiple tables with unique columns. I have heard indexing columns help quite a lot?

This depends on the DB system. Almost any relational DB system I know of automatically will add an index for primary keys. Some DB systems also add indexes automatically for each FK constraint, some others do not. For foreign key columns to be used in JOINS, it makes a lot of sense to add an index beforehand, and you should check the manual of your DB how it behaves when creating new FK constraints.

Actually, performance depends also on how your select statements look like, a classic "JOIN" should make most DBs use the index, but adding some complex logic to your select statements can break the DBs ability to use the index. For your school system, I would not waste too many thoughts into this as long as you don't suffer from any real performance problems.

Are views helpful in maintaining the relational database and making the queries faster?

Also, this depends on the DB system. Some of them have mechanism for caching or indexing views, some have not. See this SO post for detailed discussion about, for example, MS SQL server. If you really want to know it, try it out and measure. In your situation, I would not introduce any views for solving any "hypothetical" performance problems. I would only introduce views if it helps making some of your queries simpler.

And in table c I have not used separate primary key id because I want to make sure that the one student can borrow one item at a time. Is this a good practice?

To make sure that the one student can borrow one item at a time, you should use a unique contraint on the combined IDs for students and items. This has not much to do with adding a separate primary key (which can be done, or not, there are good reasons for and against doing so, but in your case, I don't expect it to make a big difference).

Best Answer

Related Solutions

Database Design – Designing a Database with Circular References for One-of-Many Case

Database Design with Performance Issues and Use of Views

Related Topic