Object-oriented – OOP: Behavior + Data, but what about ‘validation’ behaviors

object-orientedvalidation

EDIT: Thanks for all the great answers! In reading these responses I realized that I'm approaching this from the wrong angle so I wrote a new question here.

I had this discussion with my superior about whether validation rules should be in the domain model or not (specifically, in the property accessors) and I tend to lean towards putting them just about anywhere else but there. In my opinion, put them in the UI and in the brokers/persistence layer.

The argument in favor of putting it in the domain is to 'protect the data' because data is king and long after the application code has been replaced the data will still be there. And also because "object oriented programming is about coupling data and behavior".

So take, for example, an object that represents a scheduled item with a StartTime and EndTime and a validation rule that says the StartTime must be before the EndTime (and the EndTime must be after the StartTime). Then while programming this you repeatedly run into issues as you realize things like:

Shoot, I forgot StartDate/EndDate could be equal too and my code errors on half the data it tries to load from the database
Ok so apparently the StartDate has to be set before the EndDate or my EndDate validation fails. Seems odd that I can't set properties like these in any order but ok.
Or… I could add more code to check if the StartDate is null and allow the EndDate to be set to anything so long as StartDate hasn't been set yet. Ok I'll do that.
But wait, unless I force the user to use a constructor with 10 parameters I can't absolutely ensure that there won't be any uninitialized properties because my validation logic will never get hit if they don't call the properties in the first place!
Poop, I tried forcing the consumer to use ridiculously verbose constructors but now I'm having serialization headaches. The DataContractSerializer that WCF uses totally ignores my constructor. I'm sure there's a way around this though, I'll figure it out.
Except now I want to use Sql Sessions and by default it uses an XmlSerializer that requires a parameterless constructor.
Uh oh, bad data has been imported into the database without going through my perfectly crafted domain model. Now my domain model prevents me from displaying this invalid data to the user so I have to correct it all myself. It would be great if I could jut allow them to correct it, but my domain model doesn't know how to do anything but crash when it reads invalid data.

So here's what I'm trying to figure out: are 'validation rules' really 'behaviors'? Technically they are, but in regard to the idea that "object oriented programming is data + behavior", does this really apply to validation?

Best Answer

Programming isn't an exact science (at least not yet), so you have a lot of leeway which you should use to apply common sense and do the sensible thing.

Every object is responsible for itself

There is a concept more fundamental than object-oriented programming, and this is encapsulation. Encapsulation could be paraphrased as “every conceptual unit is responsible for itself”, where conceptual units can be of any scale (such as libraries, classes, methods, or even simple code blocks). Encapsulation is achieved by separating inner workings and implementation details from a public interface. It is really important to carefully choose what is part of the public interface, and what should be internal.

^{That part about methods and blocks also being eligible for encapsulation may be confusing, and I don't cover it in the rest of the post. It ends up meaning “declare variables in the narrowest possible scope, and don't use global variables”. This is not hypothetical. I have seen people return values from functions via global variables, and it was not pretty.}

You shouldn't be able to reach invalid states

Object oriented programming is one popular strategy to make data responsible for themselves, by combining behaviour with these data. One crucially important responsibility is maintaining data consistency. That means a properly encapsulated object cannot be brought into an invalid state via the public interface, or conversely: everything that can be done via the public interface is valid. There are a number of steps to maintain consistency:

Don't allow inconsistent objects to be created in the first place. If necessary, use the builder pattern to describe temporarily inconsistent objects.
Don't allow data to be mutated in a way that makes them inconsistent.
- Suggestion: Don't allow data to be mutated, at all.
Use static typing to prove consistency properties.
Use run-time validation to ensure consistency properties you cannot prove via the type system.

Static typing may seem like a no-brainer, but it's really important. When you declare fields or properties of an object, these usually have a specific type. If a certain field must contain a string, you can't put an integer there instead. The compiler then proves that your program is consistent with this requirement. However, all type systems are inherently limited, and cannot prove all properties of your program. E.g. in many languages null is a value of any reference type, and the type system often can't prove that a given variable, field, or return value will always be non-null. Furthermore, most type system lack means to statically describe types such as “a list with at least two elements” or “a floating-point number larger than or equal to zero”.

Such requirements should be described as far as possible using static typing; anything else has to be worked around via run-time validation. This validation is still the responsibility of any conceptual unit that wants to remain consistent.

Relaxing validation in trusted environments

In some cases, a component is only expected to be used from another trusted component (in C++ parlance: a friend). Then, defensive programming suggests that each conceptual unit should still be fully encapsulated. In practice, encapsulation is often relaxed for individual components and validation is skipped for “trused” data. If done so completely, this means that you only validate at the borders of your system. That system can still be encapsulated if and only if these privileged conceptual units are only reachable via a validation layer. In mainstream OOP languages this is done by making anything that doesn't have validation private or package-private, depending on where the validation border is.

Encapsulation vs. your scenario

Now we have learned a bit about maintaining consistency. How can we apply it to the scenario you outlined in your question? The basis is some object with two fields StartDate and EndDate with the constraint StartDate <= EndDate.

Shoot, I forgot StartDate/EndDate could be equal too and my code errors on half the data it tries to load from the database

This is great!
1. You were enforcing encapsulation.
2. You tested your code, which caught a mistake in your validation code. Thank God you tested it before using the code on a production database!
Ok so apparently the StartDate has to be set before the EndDate or my EndDate validation fails. Seems odd that I can't set properties like these in any order but ok.

This sounds like a problem with your validation code and not with validation in general. How could this problem be fixed? If StartDate == null is an illegal state, don't allow the object to get into that state. Think about using the builder pattern, proper validation in the constructor, and proper validation in the StartDate setter to enforce it's non-null. You could also disallow changing StartDate or EndDate once the object has been created.
But wait, unless I force the user to use a constructor with 10 parameters I can't absolutely ensure that there won't be any uninitialized properties because my validation logic will never get hit if they don't call the properties in the first place!
- If your constructor has too many parameters, your object may be violating the Single Responsibility Principle. Otherwise, look into the builder pattern.
- The constructor should always return an instance which is in a valid state. There is no excuse for doing otherwise.
Poop, I tried forcing the consumer to use ridiculously verbose constructors but now I'm having serialization headaches. The DataContractSerializer that WCF uses totally ignores my constructor. I'm sure there's a way around this though, I'll figure it out.

I am not familiar with the DataContractSerializer, but serializers generally bypass encapsulation. A solution is not serializing the object itself, but instead a dumb record that represents permanent state (memento pattern).
Uh oh, bad data has been imported into the database without going through my perfectly crafted domain model. Now my domain model prevents me from displaying this invalid data to the user so I have to correct it all myself. It would be great if I could jut allow them to correct it, but my domain model doesn't know how to do anything but crash when it reads invalid data.

The database is responsible itself for maintaining its data consistency (where “consistency” is meant in a slightly broader sense than in ACID). I previously talked about relaxing validation in a trusted context. It seems that here validation was relaxed without making sure that the database could only be reached through a validation layer, and therefore an unencapsulated system was created. The invalid data should have never been stored in the DB, and the user should have never been able to enter invalid data in the first place.

Anyway, isn't it great that you run tests with invalid data?

Best Answer

Every object is responsible for itself

You shouldn't be able to reach invalid states

Relaxing validation in trusted environments

Encapsulation vs. your scenario

Related Solutions

XSD Validation – Validating Objects with XSDs: Is Re-Serializing Redundant?

Why Anemic Domain Model Is Bad in C#/OOP but Important in F#/FP

Related Topic