I'm a bit confused with regards to the repository
Not your fault.
It may help to review chapter six of the blue book. Evans writes
The REPOSITORY retrieves the requested object, encapsulating the machinery of database queries and metadata mapping.
In other words, the REPOSITORY api is expressed in the domain specific language, and the domain agnostic concerns of "how do I get this data to/from the database" are enclosed within the implementation.
From this, you can infer that the consumer of the repository API should be speaking in domain specific values, not domain agnostic primitives.
This is consistent with a theme that Evans comes back to repeatedly; that you want your developers to be spending most of their time in the domain specific language, not manipulating primitive data types.
If it should be queried using a ValueObject (i.e. Category) who should convert the list of strings from the API request into the list of categories?
So the capability for this conversion will normally be provided by the domain model, in the form of value object constructors, factories, message builders, and so on. These tools will normally be invoked by the application code, before everything moves into the domain specific view.
That said... there's been a sort of push back against that approach. It's really kind of weird to start from a string, and now we have to convert it into a MumbleId, that gets passed to a repository that will convert it back into a string for use as a query parameter, and then some document/recordset/bytes are returned, and we reconstitute that into a domain model, that we then turn around and convert into a json document that we send back over the wire as bytes.
CQRS
is, in part, an answer to the question: can't we just bypass the entire model and copy the bytes that we need? Yes, sometimes that makes a lot of sense.
This really depends on many factors. For now, I'm going to assume the following layered architecture:
Consuming application (e.g. web) > Business > Datalayer > The actual database
For one, it depends where your DTO lives. Is this DTO create on the business level, or the datalayer level?
- If on the business level, then your datalayer doesn't know the DTO. That means that communication between business and datalayer happens via either domain models, or a custom set of primitive parameters (e.g.
Guid personId
and string newName
).
- If on the datalayer level, then your datalayer can use the DTO, which means that this operates exactly like the "custom set of primitive parameters" situation, with the only difference that instead of custom primitive parameters, you're passing a custom object which acts as the collection of custom primitive parameters.
Note: it's of course possible to have DTOs on both layers, but the outcome is the same as if you only had a datalayer DTO.
Secondly, it depends on what technology you're using to interact with your database.
If you're using Dapper or any other method where you are crafting the actual SQL query, this gives you the option of writing an explicit update query which only touches the fields you want it to.
However, this becomes more cumbersome when you start dealing with multiple update queries which strongly resemble each other or reuse nontrivial logic. Little by little, it starts violating DRY.
If you're using Entity Framework or any other similar library that means you have the SQL generated for you, you'll generally be better off sticking to the recommended approach for the library you're using. I would assume that any decent library would be able to do targeted updates of only the fields you want to have updated. But since you're dealing with a query generator (the library), you need to use it the way it expects you to use it.
Interacting with relational databases via network calls leads to tradeoffs. You're going to either have to sacrifice performance or code cleanliness (to some degree).
As a basic example, consider that repositories were initially intended to operate for one specific entity each. If you want to fetch Person
and Car
objects, you'll need to talk to the PersonRepository
and CarRepository
respectively.
From a development perspective, this is a really neat and clean way of separating different steps. And when you're dealing with an in-memory list of data, there is no real performance loss.
However, when dealing with a networked relational database, you want to minimize the amount of calls you make because the network calls always cost overhead. If you're trying to fetch a list of people and the cars they own, you're better off launching a single query that fetches both at the same time, and let the database handle the collation of those two entities.
But does that "get both" call belong to PersonRepository
or to CarRepository
? That's no longer clear. This means you've compromised your idealistic definition of a repository, because you are now trying to run queries that use multiple entity types at the same time.
This is why it's a tradeoff. Because the performance hit would otherwise be too significant, we trade away the "perfect" repositories in favor of better performance. So when you ask:
How should the repository handle the update process?
The real answer is in whatever way that maximizes what is the most important to you.
- Accept the UpdatePersonDto object, fetch the current Person object from the DB, map the fields and then update
This leads to the cleanest code and ease of development, but it effectively doubles your network calls (fetch + update) which can become a bottleneck.
Then again, if you've got bandwidth to spare and performance is not the #1 priority, but have a limited developer availability, it may be better to favor clean code so you can minimize development and maintenance time as best as you can.
- Accept the UpdatePersonDto object and perform an update on the DB just using the DTO. The Person domain model is not needed in this case
This is the more performant option. It requires handcrafting a update that only targets specific fields, which may lead to more development effort, but it will pay back dividends in performance.
- Require the Person model and leave the mapping to a higher layer
I'm not quite sure how this is different from option 1. I think the only difference here is that you split the "update person" responsibility over two classes.
That could be better codewise, but I don't think it's needed (based on my current understanding of your situation) and thus would advise against it.
Performancewise, I'm expecting this to be equal to option 1.
- Some thing else?
Option 1/2 tackle the most common tradeoff: ease of development versus runtime performance. But it's possible that you have other things to consider too (e.g. using a particular approach because your company ubiquitously uses this approach). In such a case, you need to find the approach that works best for your list of priorities.
Best Answer
If you have an entity as in a DDD-entity (not what is generally called an entity in frameworks such as entity framework) then the entity should protect its invariants. Thus using the entity as a DTO is most likely wrong. A DTO is unvalidated input data from the user and should be treated as such. An entity is validated data in one of the object types valid invariants. Thus you should have a conversion between a DTO and an entity that converts a validated DTO to a domain entity object.
In many cases people skimp on these by using anemic domain models (http://www.martinfowler.com/bliki/AnemicDomainModel.html) where the "entities" generally contain no logic and then they put all the invariant protection logic etc in external code (and thus need to check invariants all over the place). This can work, but imho it leads to bugs. But it's something that happens pretty much everywhere - especially since a lot of frameworks encourage this by making it super easy to write code this way. This is imho too bad since it leads to the easiest/fastest way of coding not being the one that leads to the best results (if you follow DDD that is)...
But it's also important to recognise that in many cases DDD might be overkill. If you are basically making a CRUD application (which in many cases you probably are) then DDD might be overkill for your domain. Of course this doesn't mean that you should throw away all the good ideas and just go into full-on "do-anything mode", but in some cases not caring about certain parts of DDD can be the correct choice to make. The hard trick here is, of course, to identify if your domain is complex enough to warrant DDD or if it's easy enough to not bite yourself in the ass by foregoing certain parts of it. :)