Composition of Data Transfer Objects

compositiondata structuresdto

Let's say I want to create a shop/order system.
I'll have an Order DTO to which I'll map the data from a request and results from the database.
The object will consist of an order number, a customer number and a list of articles.

The order number is pretty simple, I'll just use an int to store the number. But what about the customer? I'll also have a Customer DTO so I could use it inside of the Order DTO instead of saving the customer number as int.

In my case the customer number consists of a store_no (the store where the customer is registered) and a customer_no (store internal customer number).

To me it seems to be cleaner to have a structure which looks like this:

Order {
 int order_no;
 Customer cust;
}

Customer {
 int store_no;
 int cust_no;
}

instead of this:

Order {
 int order_no;
 int store_no;
 int cust_no;
}

Customer {
 int store_no;
 int cust_no;
}

My problem:
The Customer DTO will have a lot more information than just the customer number. Like a first name, last name, address, user name, some flags and so on. Things I don't care about when I just want to save an order with order_no 123 for the customer with store_no 12 cust_no 543.

So in this case using the Customer DTO inside of the Order DTO seems to be 'too much'.

Same goes for the articles.

Are there any best practices how to create such DTOs? Encapsulate DTOs or repeat the same information in different DTOs?

Best Answer

My problem: The Customer DTO will have a lot more information than just the customer number. Like a first name, last name, address, user name, some flags and so on. Things I don't care about when I just want to save an order with order_no 123 for the customer with store_no 12 cust_no 543.

A data transfer object's purpose is to represent the data to be transferred to another process, e.g. from your backend to a web frontend. If the information you are transferring has a nested structure, also using a nested type for the DTO does make sense. However, the DTO must not include unnecessary data you aren't actually using – that's just misleading.

A DTO is absolutely not the same as your domain model. Your model describes entities like a Customer or an Order with all properties that are relevant for your problem domain. This includes properly modelling the relationship between an Order and a Customer. It is not unusual to end up with a complex object graph describing all of these relationships (and equivalently: many foreign keys in a relational database).

It sounds to me like your Customer “DTO” is actually an accurate domain model, not just a DTO. It would therefore be wrong for an Order to include a full Customer instance, if all that is transferred is actually just an ID. Assuming you're transferring this data as JSON, I'd just look at the JSON and derive my DTOs from that:

{
  "order_no": 1234,
  "store_no": 1234,
  "cust_no": 1234
}

→ best represented as a single, flat DTO.

{
  "order_no": 1234,
  "customer": { "cust_no": 1234, "store_no": 1234 }
}

→ probably a separate Order and Customer type.

Another possible approach is having different DTOs for different levels of detail, e.g. a customer could be represented as a full customer with complete profile information, or just as a brief customer summary with only the most essential information, or even just the ID. This adds a lot of complexity since the same entity can be represented by multiple DTO types, but keeps you from transferring unnecessary data. That requires a design decision.

Related Solutions

Merge directed acyclic graphs minimizing number of nodes

I think your structure is the line graph of the minimal DAWG. I have generated these before, three years ago, by building the minimal DAWG, from that its line graph, then minimising the line graph. I searched the literature and Google extensively and never found this last step. I concluded that the DAWG was more useful, generally, but the line DAWG was better for display to those unaccustomed to DAWGs. There is another name for a DAWG used in natural language processing, a Word Something, the something elludes me right now.

Your adc example suggests a DAWG where everyone node has a # edge to the sink.

REST API Design – Handling Partial Nested Objects

In REST you can see everything you listed as a different resource. It is a different approach. In general you want to denormalize (database term) the data in a REST resource to fit your use case. So your examples are just fine, it includes all data you need at once so you can easily build a client for that.

In REST we can say you have the following resources:

/schools (list of schools)
/schools/1 (school 1 with list of students)
/students (maybe list of students)
/students/1 (student 1 with list of courses for example).
/courses (list of all courses)
/courses/1 (list of one course which may contain all students enrolled)

You could also have in addition:

/schools/1/students (list of all students at this school)
/students/1/courses (all courses for this student)
/students/1/courses-finished (all finished courses for this student)

Is there a smarter approach that would allow us to cache just one copy of each object, and to prevent multiple fetches to show basic screens?

Object != resource in REST. The list of schools for example is also a resource. So you might have 3 database tables here but multiple resources.

Client-side data stores Having data stored in separate data stores on the client is a different matter. I know it has been implemented in Meteor: https://meteorhacks.com/understanding-mergebox/ which allows to have an initial smaller document which expands when needed.

Subscriptions So what you do is: You subscribe to schools and students (only info needed for listings, so for example: id & name). Then when you know which student you want more details off you subscribe to that student in full detail. Behind the scenes it will populate your client-side database with all details of that specific student.

REST That's not about REST, it's a totally different way of working. With REST, in general, you should not have to associate resources on the client.

Normalisation

Instead of normalizing these objects, we could store schools and students with their nested partial objects. However, this means data duplication -- each student at Jefferson High would have the name of the school nested.

No. A REST resource (school/student/etc) is NOT equal to your database or whatsoever. (http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven#comment-743) You can return the REST resource school by making a join query behind the scenes. REST does and should not care about that. If performance becomes an issue you can just de-normalize that query behind the scenes and the REST resource is still looking the same. If you switch databases, resource looks still the same. Switching to another students management system, resource still the same. That's a big advantage you get by using REST. Originally it was designed to connect lots of unknown systems, by creating a universal API not linked to the backend technology you can work better together.

Caching When a school name changes your should invalidate all caches using that school name. You can do that using ETAG. A quite simple example to understand this is: http://fideloper.com/api-etag-conditional-get

So, let's assume you use a join to generate the school resource with the list of student names. And let's assume that you have a modifiedAt field in both schools and students table:

When you include both modifiedAt fields in your resource (or at least use them) and you generate a unique ETAG the following will happen:

When school name is updated: ETAG becomes invalid for ONE school and ALL students of that school.
When student name is update: ETAG becomes invalid from ALL schools where the student is located and ONE student (the one changing).

That allows the caches in between all to know when to invalidate their cache. You can also pre-cache those resources. ETAG allows to return the cached instance. So if the user would be offline he could just use the local resources. You could for example after login load all resources by requesting them in the background for this user:

/students/1
/student/1/courses (even all filtered courses)
And then all courses he is enrolled into.

They will then be in the local cache and are available. This is also used for optimistic loading. You could just show the on from the cache instantly. Then in the background check to see if they are updated. If they are reload the data on screen.

Understand that there are multiple caches: The server may use one, in between some cache might be active, the browser caches, your own code may cache. This universal way gets almost all well done HTTP caches to behave like you want.

HATEOAS Read about HATEOS which makes it easier for you to fetch those students. Just include an url to them in the school resource. Then you can just follow the links. Examples: https://www.infoq.com/articles/webber-rest-workflow and an example: https://developer.paypal.com/docs/integration/direct/paypal-rest-payment-hateoas-links/

That makes it more manageable to get those sub resources.

It will allow you for example to show a list view with links on the student page to all his courses. And on top you might show a link to all his finished courses (when available in the api). And that business logic would all be handled by the api. So your client could just do:

/students/1 (so you get all data on the student)

if(student.links.finished) {
  console.log(student.links.finished);
}

When he has no finished courses you could hide the link that way for example.

This can also be expanded like:

student.links.courses {
  latest: /students/1/new-courses
  finished: /students/1/courses-finished
  failed: /students/1/courses-failed
}

That way you could on your client just do a forEach loop and show all available filters.

Strategy Off course you can mix both. You could write a merge box implementation like which uses a REST api. That might be worth it if you want to re-use that data on the client and do for example complex searches / filtering on them. That's the advantage of having a client-side database.

On the other hand: If you just deliver the right data as REST resources there is no direct advantage of converting that data into a database at client side and then read from that. If you could just use the original resource because it contains just what you need.

So it is a trade-off depending on your exact use-case. If you can keep it simple and just use the REST resources that will work just fine. And it's simple which is a great advantage.

If you have to do complex data searches etc. the local client database might be a good fit.

Best Answer

Related Solutions

Merge directed acyclic graphs minimizing number of nodes

REST API Design – Handling Partial Nested Objects

Related Topic