ORM Data Validation – Should Constraints Be Enforced in Database?

databasedryormvalidation

I have always applied constraints at the database level in addition to my (ActiveRecord) models. But I've been wondering if this is really required?

A little background

I recently had to unit test a basic automated timestamp generation method for a model. Normally, the test would create an instance of the model and save it without validation. But there are other required fields that aren't nullable at the in the table definition, meaning I cant save the instance even if I skip the ActiveRecord validation. So I'm thinking if I should remove such constraints from the db itself, and let the ORM handle them?

Possible advantages if I skip constraints in db, imo –

  • Can modify a validation rule in the model, without having to migrate
    the database.
  • Can skip validation in testing.

Possible disadvantage?

If its possible that ORM validation fails or is bypassed, howsoever, the database does not check for constraints.

What do you think?

EDIT In this case, I'm using the Yii Framework, which generates the model from the database, hence database rules are generated also (though I could always write them post-generation myself too).

Best Answer

Your guiding principle should be Don’t Repeat Yourself:

In software engineering, Don't Repeat Yourself (DRY) is a principle of software development aimed at reducing repetition of information of all kinds, especially useful in multi-tier architectures. The DRY principle is stated as "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system."

The ORM is essentially an extra layer (or tier if you prefer), sitting comfortably between your application and your data storage(s). Your constraints should be in one place, and one place only, be it the ORM or the data storage, otherwise soon enough you’ll end up maintaining different versions of them. You really don’t want to do that.

However, in practice, most half decent ORMs automatically generate a great deal of your models from your data schema. Although there’s still duplication, the chances for maintenance hell are minimal since the duplicated ORM code is generated following the same pattern each time. It would be ideal to have no duplicate code, but automatically generated constraints is the next best thing.

Also, having your constraints in one place doesn’t necessarily mean you should have all your constraints in the same place. Some, like referential integrity constraints, may be more fitting for the data storage (but may be lost if you move to another data storage), and some, mostly those that are about complex business logic, are more fitting for your ORM. It’d be preferable to have all your apples in the same basket, but…

Failures

You mention the ORM failing. That’s absolutely irrelevant to your question, your application should think of the ORM and the data storage(s) as a single entity. If it fails, it failed, bypassing the ORM to talk to the data storage directly is not a good idea.

Bypassing the ORM for anything else

Also not a good idea. However, it can happen for a variety of reasons:

  1. Legacy parts of the application that were build before the ORM was introduced.

    That’s a tough one, and exactly the situation I’m dealing with right now, hence my constant repeat of “maintenance hell”. Either you keep maintaining the non ORM parts, or you rewrite them to use the ORM. The second option might make more sense initially, but it’s a decision that’s solely based on what exactly those parts of your application are doing, and how valuable a complete rewrite would be in the long run.

    Try changing a key in a badly designed 2*10^8 rows MySQL table (without downtime) and you’ll understand where I’m coming from.

  2. Non legacy parts of the application that absolutely need to directly talk to the data storage:

    Even trickier. ORMs are fancy tools, and they take care of almost everything, but sometimes they just get in the way or even are absolutely useless. The buzzword (buzzphrase really) is object-relational impedance mismatch, simply put it’s not technically possible for your ORM to do everything your relational database does, and for some of the stuff they do, there’s a significant performance penalty.

Comments

From the point of data integrity, constraints MUST be on the database, and SHOULD be on the application. What if your application is accessed from a web and a desktop applications, or a mobile app, or a webservice? – Luiz Damim

This is where adding an extra layer would be extremely helpful, and if we are talking about a web application I’d go with a REST API. An overly simplistic design for this would be:

enter image description here

The ORM would sit between the API and the data storages, and everything behind the API (including it) would be considered a single entity from the various applications.

Related Topic