Ok. Problem 1: getting DTO's from entities:
Since your entities can expose their data publicly you can access their properties and instantiate DTO object or simply serialise the entity directly
Problem 2 : Entities from DTO's:
A constructor method which takes a list of the properties to be set can be called using the properties of the DTO
Problem 3 : large entities when you only need summary info
Create a new summary object which you can retrieve from the repository. Note I suggest you make strongly typed repositories with methods like Repo.GetMyObjectById(string id) rather than expose a generic ORM.
Problem 4 : where to orchestrate all this.
My recommendation is to have a service class one level below your hosting service/app/website.
This has access to the repositories DTO's and entities and its methods map to the controllers/service calls of your application so that you do not need any code at the top level and can host the same service in multiple ways.
Giving it access to the repos is not an issue when you can only use them to retireve entities rather than doing any query they like.
Putting this assembly/orchestration logic inside the entity is usually bad as you will want the entity to be reused for other purposes.
This top level service should be very light. Just, get these objects, call this method, create and return the result/DTO/viewmodel
Because its so light its not a massive prob if you skip the layer and put the code in your controller. But it will save you time if you change the hosting layer and help with testing etc
First of all, merely citing someone's article on the Internet does not constitute sufficient justification for changing a practice. You have to weigh the pros and cons, and make up your own mind.
Note that the author of the article you cited hates ORM's. He follows a strict principle of encapsulating code with its data, which is sort of the foundational principle of object-orientation. He strongly dislikes ORM because it strips classes of their intelligence, and he's not wrong about that.
In his ORM hate article, he writes (more or less) that classes should be responsible for saving themselves to the database, a practice that violates a principle called "persistence ignorance." Persistence ignorance simply means that classes shouldn't know anything about their database overlord, and it's difficult to achieve this in any realistic manner with ORM's. He gets around the persistence ignorance problem by using interfaces, making his data ignorant of its underlying implementation.
To be fair, I write code that looks a lot like his under the hood, albeit a bit more streamlined than his (Java has a reputation for being very verbose, and I don't bother with the DTO interfaces). After wrestling with Entity Framework for awhile, I began using Dapper and writing SQL queries instead. I get better performance, less complexity and finer targeting of the database.
The class that implements your so-called "data transfer object" is a nice place to put this code; all you have to do is hand it an IdbConnection
object, and the class has all it needs to read from and persist itself to the database.
I guess my question is, why does this have to be an either/or choice? If you need something from the database that requires a DTO, then use a DTO. If you want what you're calling a "domain object," then use that instead.
Best Answer
This really depends on many factors. For now, I'm going to assume the following layered architecture:
For one, it depends where your DTO lives. Is this DTO create on the business level, or the datalayer level?
Guid personId
andstring newName
).Note: it's of course possible to have DTOs on both layers, but the outcome is the same as if you only had a datalayer DTO.
Secondly, it depends on what technology you're using to interact with your database.
If you're using Dapper or any other method where you are crafting the actual SQL query, this gives you the option of writing an explicit update query which only touches the fields you want it to.
However, this becomes more cumbersome when you start dealing with multiple update queries which strongly resemble each other or reuse nontrivial logic. Little by little, it starts violating DRY.
If you're using Entity Framework or any other similar library that means you have the SQL generated for you, you'll generally be better off sticking to the recommended approach for the library you're using. I would assume that any decent library would be able to do targeted updates of only the fields you want to have updated. But since you're dealing with a query generator (the library), you need to use it the way it expects you to use it.
Interacting with relational databases via network calls leads to tradeoffs. You're going to either have to sacrifice performance or code cleanliness (to some degree).
As a basic example, consider that repositories were initially intended to operate for one specific entity each. If you want to fetch
Person
andCar
objects, you'll need to talk to thePersonRepository
andCarRepository
respectively.From a development perspective, this is a really neat and clean way of separating different steps. And when you're dealing with an in-memory list of data, there is no real performance loss.
However, when dealing with a networked relational database, you want to minimize the amount of calls you make because the network calls always cost overhead. If you're trying to fetch a list of people and the cars they own, you're better off launching a single query that fetches both at the same time, and let the database handle the collation of those two entities.
But does that "get both" call belong to
PersonRepository
or toCarRepository
? That's no longer clear. This means you've compromised your idealistic definition of a repository, because you are now trying to run queries that use multiple entity types at the same time.This is why it's a tradeoff. Because the performance hit would otherwise be too significant, we trade away the "perfect" repositories in favor of better performance. So when you ask:
The real answer is in whatever way that maximizes what is the most important to you.
This leads to the cleanest code and ease of development, but it effectively doubles your network calls (fetch + update) which can become a bottleneck.
Then again, if you've got bandwidth to spare and performance is not the #1 priority, but have a limited developer availability, it may be better to favor clean code so you can minimize development and maintenance time as best as you can.
This is the more performant option. It requires handcrafting a update that only targets specific fields, which may lead to more development effort, but it will pay back dividends in performance.
I'm not quite sure how this is different from option 1. I think the only difference here is that you split the "update person" responsibility over two classes.
That could be better codewise, but I don't think it's needed (based on my current understanding of your situation) and thus would advise against it.
Performancewise, I'm expecting this to be equal to option 1.
Option 1/2 tackle the most common tradeoff: ease of development versus runtime performance. But it's possible that you have other things to consider too (e.g. using a particular approach because your company ubiquitously uses this approach). In such a case, you need to find the approach that works best for your list of priorities.