Clean Architecture – Validation in Domain vs Data Persistence Layer

business-rulesclean codedomain-modelvalidation

I'm studying up on clean and as a result am quite dramatically rethinking a great deal of how I design and write software.

I've thing I'm still wrestling with however, is for business rules like "on save updates to some item, first load All the list of items I have permission to view/edit etc, confirm that this item is in the list, and that the item category is not currently locked from use, (and other rules etc etc)".. because that is a (complex but not atypical) business rule, and so should be handled in the application domain rather than push business logic into the db/persistence layer.

However it seems to me that to efficiently check​ these conditions it is often going to be best handled with a nicely crafted db query, rather than loading all data into the application domain…

Without prematurely optimization, what's a recommended approach or some uncle Bob articles dealing with this question? Or would he say "validate in the domain until it becomes a problem"??

I am really struggling to find any good examples / samples for anything other than the most basic of use cases.

Update:

Hi all, thanks for the replies. I should have been clearer, I've been writing (mostly web app) software for a long time, and have definitely already experienced and agree with all the topics you collectively describe (validate by backend, don't trust client data, generally speaking chase raw efficiency only when required, however acknowledge strengths of the db tools when available, etc etc) and have gone through the developer learning lifecycle of "throw it all together" to "build a giant fat controller with N-tiers applications" code trends, and now really liking and investigating the clean / single responsibility style etc, basically as the result of a few projects recently that evolved into quite clunky and widely-distributed business rules as the projects evolved and further client requirements came to light.

In particular, I'm looking at Clean style architecture in the context of building REST apis for client-facing as well as internal-usage functionality, where many of the business rules might be much more complex than basically every example you see on the net (even by the Clean / Hex architecture guys themselves).

So I guess I was really asking (and failed to state clearly) about how Clean and a REST api would sit together, where most MVC stuff you see these days has incoming request validators (e.g FluentValidation library in .NET), but where many of my "validation" rules are not so much "is this a string of less than 50 characters" but more "can this user calling this usercase/interactor perform this operation on this collection of data given that some related object is currently locked by Team X until later in the month etc etc"… those kind of deeply involved validations where LOTS of business domain objects and domain rules are applicable.

Should I spin those rules out into a specific kind of Validator-object type to accompany each usecase-interactor (inspired by the FluentValidator project but with more business logic and data access involved), should I treat the validation somewhat like a Gateway, should i put those validations IN a gateway (which i think is wrong), etc etc.

For reference, I am going off several articles like this, but Mattia doesn't discuss validation much.

But I guess the short answer to my question is much like the answer that I have accepted: "It's never easy, and it depends".

Best Answer

Validation of data entry is one of those things where everyone starts out trying to make it pure and clean and (if they're smart about it) eventually gives up, because there are so many competing concerns.

  • The UI layer must do some forms of validation right there on the client page/form in order to provide realtime feedback to the user. Otherwise the user spends a lot of time waiting for feedback while a transaction posts across the network.

  • Because the client often runs on an untrusted machine (e.g. in nearly all web applications), these validation routines must be executed again server side where the code is trusted.

  • Some forms of validation are implicit due to input constraints; for example, a textbox may allow only numeric entry. This means that you might not have a "is it numeric?" validator on the page, but you will still need one on the back end, somewhere, since UI constraints could be bypassed (e.g. by disabling Javascript).

  • The UI layer must do some forms of validation at the service perimeter (e.g. server-side code in a web application) in order to insulate the system against injection attacks or other malicious forms of data entry. Sometimes this validation isn't even in your code base, e.g. ASP.NET request validation.

  • The UI layer must do some forms of validation just to convert user-entered data into a format that the business layer can understand; for example, it must turn the string "6/26/2017" into a DateTime object in the appropriate time zone.

  • The business layer should do most forms of validation because, hey, they belong in the business layer, in theory.

  • Some forms of validation are more efficient at the database layer, especially when referential integrity checks are needed (e.g. to ensure that a state code is in the list of 50 valid states).

  • Some forms of validation must occur in the context of a database transaction due to concurrency concerns, e.g. reserving a unique user name has to be atomic so some other user doesn't grab it while you are processing.

  • Some forms of validation can only be performed by third party services, e.g. when validating that a postal code and a city name go together.

  • Throughout the system, null checks and data conversion checks may occur at multiple layers, to ensure reasonable failure modes in the presence of code flaws.

I have seen some developers try to codify all the validation rules in the business layer, and then have the other layers call it to extract the business rules and reconstruct the validation at a different layer. In theory this would be great because you end up with a single source of truth. But I have never, ever seen this approach do anything other than needlessly complicate the solution, and it often ends very badly.

So if you're killing yourself trying to figure out where your validation code goes, be advised-- in a practical solution to even a moderately complex problem, validation code will end up going in several places.