Database Design – Should Table Relations Be Defined in Database or Code?

MySQLrdbmsrelational-databaseschema

In my experience, many of the projects I have read in the past didn't have relationship definitions in the database, instead they only defined them in the source code. So I'm wondering what are the advantages/disadvantages of defining relations between tables in the database and in source code? And the broader question is about other advanced features in modern databases like cascade, triggers, procedures… There are some points in my thoughts:

In the database:

  • Correct data from design. Prevent application errors which can cause invalid data.

  • Reduce network round trip to application when inserting/updating data as application has to make more query(s) to check data integrity.

In source code:

  • More flexible.

  • Better when scaling to multiple databases, as sometimes the relation can be cross-database.

  • More control over data integrity. The database doesn't have to check every time the application modifies data (complexity can be O(n) or O(n log n) (?)). Instead, it's delegated to application. And I think handling data integrity in the application will lead to more verbose error messages than using the database. Eg: when you create an API server, if you define the relations in the database, and something goes wrong (like the referenced entity doesn't exist), you will get an SQL Exception with a message. The simple way will be to return 500 to the client that there is an "Internal server error" and the client will have no idea what is going wrong. Or the server can parse the message to figure out what's wrong, which is an ugly, error-prone way in my opinion. If you let the application handle this, the server can generate a more meaningful message to client.

Is there anything else?

Edit: as Kilian points out, my point about performance & data integrity is very misguided. So I edited to correct my point there. I totally understand that letting the database handle it will be a more efficient and robust approach. Please check the updated question and give some thoughts about it.

Edit: thank you everyone. The answers I received all point out that the constraints/relations should be defined in the database. :). I have one more question, as it is quite out of scope of this question, I've just posted it as a separate question: Handle database error for API server. Please leave some insights.

Best Answer

TL;DR: Relationship constraints should go in the database.


Your application ain't big enough.

You are correct, indeed, that enforcing relationships across databases may require enforcing them in the application.

I would point out, however, that you should first check the documentation of the database software you are using, and check existing product offers. For example, there are clustering offers on top of Postgres and MySQL.

And even if you end up needing to have some validation in the application, don't throw out the baby with the bath water. After all, the less you have to do, the better off you are.

Finally, if you are worried about future scalability issues, I am afraid that your application will have to undergo significant changes before it can scale anyway. As a rule of thumb, every time you grow 10x, you have to re-design... so let's not sink too much money into failing to anticipate scalability issues, and instead use money to actually reach the point where you have those issues.

Your application ain't correct enough.

What is the chance that the database you use have a faulty implementation of the check compared to the chance that your application has a faulty implementation of the check?

And which one do you alter most often?

I'd bet on the database being correct, any time.

Your developers ain't thinking distributed enough.

Reduce network round trip to application when insert/update data as application has to make more query(s) to check data integrity.

Red Flag!1

If you are thinking:

  • check if the record exists
  • if not, insert record

then you failed the most basic concurrency issue: another process/thread might be adding the record as you go.

If you are thinking:

  • check if the record exists
  • if not, insert record
  • check if the record was inserted as a duplicate

then you failed to account for MVCC: the view of the database that you have is a snapshot at the time your transaction started; it does not show all the updates that are occurring, and maybe not even committed.

Maintaining constraints across multiple sessions is a really hard problem, be glad it's solved in your database.

1 Unless your database properly implements the Serializable property; but few actually do.


Last:

And I think, handle data integrity in application will let to more verbose error message than using database. Eg: when you create an API server. If you define relations in database, and something go wrong(like the referenced entity doesn't exist), you will get an SQL Exception with message.

Do not parse error messages, if you use any production-grade database it should return structured errors. You'll have some error code, at least, to indicate what is possibly wrong, and based on this code you can craft a suitable error message.

Note that most of the times the code is enough: if you have an error code telling you that a referenced foreign key does not exist, then it's likely that this table only has one foreign key, so you know in the code what the problem is.

Also, and let's be honest here, most of the times you will not handle errors that gracefully anyway. Just because there are so many of them and you'll fail to account for them all...

... which just ties in to the correctness point above. Each time you see a "500: Internal Server Error" because a database constraint fired and was not handled, it means the database saved you, since you just forgot to handle it in the code.

Related Topic