Validation of data entry is one of those things where everyone starts out trying to make it pure and clean and (if they're smart about it) eventually gives up, because there are so many competing concerns.
The UI layer must do some forms of validation right there on the client page/form in order to provide realtime feedback to the user. Otherwise the user spends a lot of time waiting for feedback while a transaction posts across the network.
Because the client often runs on an untrusted machine (e.g. in nearly all web applications), these validation routines must be executed again server side where the code is trusted.
Some forms of validation are implicit due to input constraints; for example, a textbox may allow only numeric entry. This means that you might not have a "is it numeric?" validator on the page, but you will still need one on the back end, somewhere, since UI constraints could be bypassed (e.g. by disabling Javascript).
The UI layer must do some forms of validation at the service perimeter (e.g. server-side code in a web application) in order to insulate the system against injection attacks or other malicious forms of data entry. Sometimes this validation isn't even in your code base, e.g. ASP.NET request validation.
The UI layer must do some forms of validation just to convert user-entered data into a format that the business layer can understand; for example, it must turn the string "6/26/2017" into a DateTime object in the appropriate time zone.
The business layer should do most forms of validation because, hey, they belong in the business layer, in theory.
Some forms of validation are more efficient at the database layer, especially when referential integrity checks are needed (e.g. to ensure that a state code is in the list of 50 valid states).
Some forms of validation must occur in the context of a database transaction due to concurrency concerns, e.g. reserving a unique user name has to be atomic so some other user doesn't grab it while you are processing.
Some forms of validation can only be performed by third party services, e.g. when validating that a postal code and a city name go together.
Throughout the system, null checks and data conversion checks may occur at multiple layers, to ensure reasonable failure modes in the presence of code flaws.
I have seen some developers try to codify all the validation rules in the business layer, and then have the other layers call it to extract the business rules and reconstruct the validation at a different layer. In theory this would be great because you end up with a single source of truth. But I have never, ever seen this approach do anything other than needlessly complicate the solution, and it often ends very badly.
So if you're killing yourself trying to figure out where your validation code goes, be advised-- in a practical solution to even a moderately complex problem, validation code will end up going in several places.
Arguably, the smallest method of encapsulation is a function.
float harmonic(int n)
{
float h = 1.0;
for (int i = 2; i <= n; i++) {
h += 1.0 / i;
}
return h;
}
This function contains both code and data. When the function completes, it returns the data that it contains.
Classes encapsulate code and data in a similar manner. The only real difference is that you can have multiple functions (called "methods" in a class) operating on the same data, and multiple instances of that data.
Consider this partial code listing of a Complex Number class, obtained from here:
public class Complex {
private final double re; // the real part
private final double im; // the imaginary part
// create a new object with the given real and imaginary parts
public Complex(double real, double imag) {
re = real;
im = imag;
}
// return a new Complex object whose value is (this + b)
public Complex plus(Complex b) {
Complex a = this; // invoking object
double real = a.re + b.re;
double imag = a.im + b.im;
return new Complex(real, imag);
}
// return a new Complex object whose value is (this * b)
public Complex times(Complex b) {
Complex a = this;
double real = a.re * b.re - a.im * b.im;
double imag = a.re * b.im + a.im * b.re;
return new Complex(real, imag);
}
}
Both of these examples of encapsulation are, shall we say, "self-contained." They don't rely on any external dependencies to function.
The problem of encapsulating code and data gets a bit more thorny when you start designing business applications. The reason this is true is because business applications concern themselves primarily with collections of entities and the relationships between those entities. While there can and are operations that can be performed atomically on individual entities, this is rare. It is more common to perform operations that affect the relationships between entities or the state or number of entities within a collection. Consequently, most of the business logic is more likely to be found in object aggregates.
To illustrate, consider an ordinary business like Amazon. There's no particular reason to pick Amazon, other than it is unremarkably similar to other businesses in many ways: it has customers, inventory, orders, invoices, payments, credits: the usual suspects.
What can you encapsulate within a Customer entity that can be atomically executed, divorced from other entities? Well, maybe you can change their last name. That's a data change in the database that can happen automatically in a repository somewhere, using an anemic data model. Perhaps you can change their password hash. That requires some logic, but it's unlikely to live in the Customer entity. It's more likely to exist in some security module.
All of the interesting business logic lives outside of the fundamental entities. Consider an Invoice, which is not an individual entity, but rather an aggregate of several entities. What can you do inside an Invoice class, divorced from the rest of the system? Well, you can change the shipping address. That's simply a change to a foreign key in the Invoice entity. You can calculate a Total (the sum of the line item quantities and costs), and finally we get to some non-trivial logic that can be encapsulated in the entity itself. Maybe the line items have a line-item total property on them, so there's a bit of logic there.
But what if you want to calculate a balance? Now you have to go somewhere else besides the Invoice to make that calculation, because the Invoice doesn't know anything about all of the other invoices (by design). That could happen in the Customer entity, but it's just as likely to occur in some Accounting module elsewhere.
And then you have linking entities, entities whose sole purpose is to provide connections between entities at the data level. There's generally no logic in those whatsoever.
So at the bottom of your data hierarchy are simple data transfer objects. When combined into aggregate objects, they become useful from a logic standpoint, and any or all of them are subject to processing by any number of software modules, treated as simply data. When you think about it, it doesn't really make much sense to bake a lot of business logic into something like a Customer object, because now you're tightly binding that object to your specific way of doing business.
Should classes encapsulate data and logic? Of course, when it is appropriate and useful to do so. The core idea in software design is suitability. There are no absolute principles; software design techniques must always be evaluated in the context of your specific system to determine if they are appropriate for your specific functional and non-functional requirements.
Best Answer
It's never okay to satisfy hypothetical performance goblins at the expense of fewer lines of code or an API that more clearly expresses the relationship between entities.
So, the first example is preferred. Any decent ORM should be able to lazy-load the collection, and some of the more advanced ones will even let define a property on an entity with a custom loading function, also lazy-loaded. That takes care of both of your use cases without breaking encapsulation of the Order class and exposing its inner data structure.