How will encapsulation help when making changes in code and from its
rippling effects. For a data member, if I change its type from int to
float, (even if I am exposing this using property) I will need to
change variable type where I am using already using this code.
The benefit of encapsulation is that it lets you change the internal implementation without breaking client code. It doesn't protect you if you decide that you need to change the interface to your code, but that's a different matter.
Example: Say you have a value representing the price per unit of some commodity. The price is expressed in cents, and because you don't deal in fractional cents you decided to make the property an integer (I'll use C here because I'm not very familiar with C#):
int _price
int pricePerUnit(void) {
return _price;
}
int priceForUnits(int units) {
return units * _price;
}
That all works out fine until one day when somebody notices that your firm is losing a lot of money due to rounding errors. Many of the commodities that you track are bought and sold in lots of many thousands of units, so you need to start tracking the price to an accuracy of at least 0.001 cent. Because you were smart enough to encapsulate the price instead of letting clients access it directly, you can make that change pretty quickly:
double _dprice
int pricePerUnit(void) {
return (int)_dprice;
}
int priceForUnits(int units) {
return (int)(units * _dprice);
}
The interface that clients use to obtain prices stays the same, but the data they get back is now more accurate. If the price per unit is $1.001, priceForUnits(1000000)
will now return a price that's $1000 greater than before. That happens even though you haven't changed the interface to your system at all, and you therefore haven't broken any client code.
Now, that may not always be all that you need to do. Sometimes you'll need to change or augment your interface so that you can report the price more accurately to clients, too:
double pricePerUnit() {
return _dprice;
}
A change like that will break client code, so you might instead keep the old interface and provide a newer, better routine:
int pricePerUnit() {
return (int)_dprice;
}
double accuratePricePerUnit() {
return _dprice;
}
You and the rest of your team can then embark on the process of converting all the clients of your system to use the newer, better accuratePricePerUnit()
. The client code will get more accurate as you make progress on that task, but even the old stuff should continue to work as well as it did in the past.
Anyway, the point is that encapsulation lets you change the way the internals work while presenting a consistent interface, and that helps you make useful changes without breaking other code. It doesn't always protect you from having to update other code, but it can at least help you do that in a controlled manner.
This is a more well-formed transcription of my initial comment under your question. The answers to questions addressed by the OP may be found at the bottom of this answer. Also please check the important note located at the same place.
What you are currently describing, Sipo, is a design pattern called Active record. As with everything, even this one has found its place among programmers, but has been discarded in favour of repository and the data mapper patterns for one simple reason, scalability.
In short, an active record is an object, which:
- represents an object in your domain (includes business rules, knows how to handle certain operations on the object, such as if you can or cannot change a username and so forth),
- knows how to retrieve, update, save and delete the entity.
You address several issues with your current design and the main problem of your design is addressed in the last, 6th, point (last but not least, I guess). When you have a class for which you are designing a constructor and you do not even know what the constructor should do, the class is probably doing something wrong. That happened in your case.
But fixing the design is actually pretty simple by splitting the entity representation and CRUD logic into two (or more) classes.
This is what your design looks like now:
Employee
- contains information about the employee structure (its attributes) and methods how to modify the entity (if you decide to go the mutable way), contains CRUD logic for the Employee
entity, can return a list of Employee
objects, accepts an Employee
object when you want to update an employee, can return a single Employee
through a method like getSingleById(id : string) : Employee
Wow, the class seems huge.
This will be the proposed solution:
Employee
- contains information about the employee structure (its attributes) and methods how to modify the entity (if you decide to go the mutable way)
EmployeeRepository
- contains CRUD logic for the Employee
entity, can return a list of Employee
objects, accepts an Employee
object when you want to update an employee, can return a single Employee
through a method like getSingleById(id : string) : Employee
Have you heard of separation of concerns? No, you will now. It is the less strict version of the Single Responsibility Principle, which says a class should actually have only one responsibility, or as Uncle Bob says:
A module should have one and only one reason to change.
It is quite clear that if I was able to clearly split your initial class into two which still have a well rounded interface, the initial class was probably doing too much, and it was.
What is great about the repository pattern, it not only acts as an abstraction to provide a middle layer between database (which can be anything, file, noSQL, SQL, object-oriented one), but it does not even need to be a concrete class. In many OO languages, you can define the interface as an actual interface
(or a class with a pure virtual method if you are in C++) and then have multiple implementations.
This completely lifts the decision whether a repository is an actual implementation of you are simply relying on the interface by actually relying on a structure with the interface
keyword. And repository is exactly that, it is an fancy term for data layer abstraction, namely mapping data to your domain and vice versa.
Another great thing about separating it into (at least) two classes is that now the Employee
class can clearly manage its own data and do it very well, because it does not need to take care of other difficult things.
Question 6: So what should the constructor do in the newly created Employee
class? It is simple. It should take the arguments, check if they are valid (such as an age shouldn't probably be negative or name shouldn't be empty), raise an error when the data was invalid and if the validation passed assign the arguments to private variables of the entity. It now cannot communicate with the database, because it simply has no idea how to do it.
Question 4: Cannot be answered at all, not generally, because the answer heavily depends on what exactly you need.
Question 5: Now that you have separated the bloated class into two, you can have multiple update methods directly on the Employee
class, like changeUsername
, markAsDeceased
, which will manipulate the data of the Employee
class only in RAM and then you could introduce a method such as registerDirty
from the Unit of Work pattern to the repository class, through which you would let the repository know that this object has changed properties and will need to be updated after you call the commit
method.
Obviously, for an update an object requires to have an id and thus be already saved, and it's the repository's responbitility to detect this and raise an error when the criteria is not met.
Question 3: If you decide to go with the Unit of Work pattern, the create
method will now be registerNew
. If you do not, I would probably call it save
instead. The goal of a repository is to provide an abstraction between the domain and the data layer, because of this I would recommend you that this method (be it registerNew
or save
) accepts the Employee
object and it is up to the classes implementing the repository interface, which attributes they decide to take out of the entity. Passing an entire object is better so you do not need to have many optional parameters.
Question 2: Both methods will now be a part of the repository interface and they do not violate the single responsibility principle. The responsibility of the repository is to provide CRUD operations for the Employee
objects, that is what it does (besides Read and Delete, CRUD translates to both Create and Update). Obviously, you could split the repository even further by having an EmployeeUpdateRepository
and so forth, but that is rarely needed and a single implementation can usually contain all CRUD operations.
Question 1: You ended up with a simple Employee
class which will now (among other attributes) have id. Whether the id is filled or empty (or null
) depends on whether the object has been already saved. Nonetheless, an id is still an attribute the entity owns and the responsibility of the Employee
entity is to take care of its attributes, hence taking care of its id.
Whether an entity does or does not have an id does not usually matter untill you try to do some persistence-logic on it. As mentioned in the answer to the question 5, it is the repository's responsibility to detect you aren't trying to save an entity which has already been saved or trying to update an entity without an id.
Important note
Please be aware that although separation of concerns is great, actually designing a functional repository layer is quite a tedious work and in my experience is a bit more difficult to get right than the active record approach. But you will end up with a design which is far more flexible and scalable, which may be a good thing.
Best Answer
Here are a couple suggestions to help you think in new directions:
First, it seems like you're trying to tie your calculation functionality specifically to your Employee class. That seems unnecessarily specific. Why should your calculator class care whether the object it's working on represents an employee or something else? This seems like a natural place to use an interface to define the functionality that your calculator needs to apply a formula, without caring what the thing that implements the interface actually represents. And the functionality that your calculator needs is probably pretty simple: it just needs to be able to get named values and possibly also set named values. If you have a formula like:
then it seems like you might need a
Calculable
interface that has a function likevalueForkey(key)
wherekey
is a string, so that the calculator can fetch values foryearsOfService
andbonusDays
in order to do its work. And the interface should also have asetValueForKey(value, key)
method so that it can store the result of the formula for theannualVacationDays
key. But the only things the calculator needs to do its work are those methods -- it shouldn't care what kind of object it is.Second, given that the point of OOP is that you can combine data and the operations on that data, separating the formulas from the class that applies them doesn't make a lot of sense to me. Let each formula be able to apply itself to an object that implements the
Calculable
interface.