Layered Architecture – Updating DB from a DTO in Repository

dtorepository

Given a Person model:

Person {
    int PersonId;
    string Name;
    string Email;
}

and an UpdatePersonDto which just updates the Name

UpdatePersonDto {
    int PersonId;
    string Name;
}

How should the repository handle the update process?

  1. Accept the UpdatePersonDto object, fetch the current Person object from the DB, map the fields and then update
  2. Accept the UpdatePersonDto object and perform an update on the DB just using the DTO. The Person domain model is not needed in this case
  3. Require the Person model and leave the mapping to a higher layer
  4. Some thing else?

Best Answer

This really depends on many factors. For now, I'm going to assume the following layered architecture:

Consuming application (e.g. web) > Business > Datalayer > The actual database

For one, it depends where your DTO lives. Is this DTO create on the business level, or the datalayer level?

  • If on the business level, then your datalayer doesn't know the DTO. That means that communication between business and datalayer happens via either domain models, or a custom set of primitive parameters (e.g. Guid personId and string newName).
  • If on the datalayer level, then your datalayer can use the DTO, which means that this operates exactly like the "custom set of primitive parameters" situation, with the only difference that instead of custom primitive parameters, you're passing a custom object which acts as the collection of custom primitive parameters.

Note: it's of course possible to have DTOs on both layers, but the outcome is the same as if you only had a datalayer DTO.


Secondly, it depends on what technology you're using to interact with your database.

If you're using Dapper or any other method where you are crafting the actual SQL query, this gives you the option of writing an explicit update query which only touches the fields you want it to.
However, this becomes more cumbersome when you start dealing with multiple update queries which strongly resemble each other or reuse nontrivial logic. Little by little, it starts violating DRY.

If you're using Entity Framework or any other similar library that means you have the SQL generated for you, you'll generally be better off sticking to the recommended approach for the library you're using. I would assume that any decent library would be able to do targeted updates of only the fields you want to have updated. But since you're dealing with a query generator (the library), you need to use it the way it expects you to use it.


Interacting with relational databases via network calls leads to tradeoffs. You're going to either have to sacrifice performance or code cleanliness (to some degree).

As a basic example, consider that repositories were initially intended to operate for one specific entity each. If you want to fetch Person and Car objects, you'll need to talk to the PersonRepository and CarRepository respectively.
From a development perspective, this is a really neat and clean way of separating different steps. And when you're dealing with an in-memory list of data, there is no real performance loss.

However, when dealing with a networked relational database, you want to minimize the amount of calls you make because the network calls always cost overhead. If you're trying to fetch a list of people and the cars they own, you're better off launching a single query that fetches both at the same time, and let the database handle the collation of those two entities.

But does that "get both" call belong to PersonRepository or to CarRepository? That's no longer clear. This means you've compromised your idealistic definition of a repository, because you are now trying to run queries that use multiple entity types at the same time.

This is why it's a tradeoff. Because the performance hit would otherwise be too significant, we trade away the "perfect" repositories in favor of better performance. So when you ask:

How should the repository handle the update process?

The real answer is in whatever way that maximizes what is the most important to you.

  1. Accept the UpdatePersonDto object, fetch the current Person object from the DB, map the fields and then update

This leads to the cleanest code and ease of development, but it effectively doubles your network calls (fetch + update) which can become a bottleneck.

Then again, if you've got bandwidth to spare and performance is not the #1 priority, but have a limited developer availability, it may be better to favor clean code so you can minimize development and maintenance time as best as you can.

  1. Accept the UpdatePersonDto object and perform an update on the DB just using the DTO. The Person domain model is not needed in this case

This is the more performant option. It requires handcrafting a update that only targets specific fields, which may lead to more development effort, but it will pay back dividends in performance.

  1. Require the Person model and leave the mapping to a higher layer

I'm not quite sure how this is different from option 1. I think the only difference here is that you split the "update person" responsibility over two classes.
That could be better codewise, but I don't think it's needed (based on my current understanding of your situation) and thus would advise against it.

Performancewise, I'm expecting this to be equal to option 1.

  1. Some thing else?

Option 1/2 tackle the most common tradeoff: ease of development versus runtime performance. But it's possible that you have other things to consider too (e.g. using a particular approach because your company ubiquitously uses this approach). In such a case, you need to find the approach that works best for your list of priorities.

Related Topic