Is it a good idea for an API to return only ids from objects

apijsonperformance

I have this URL:

  /api/pallets/list

Which returns a JSON array that looks like this:

 [{
     palletId: 333,
     code: 'J050000081',
     grower: {
         growerId: 35,
         name: 'Grower Of Blueberries Inc'
     },
     species: {
         speciesId: 1,
         name: 'Blueberries'
     },
     caliber: {
         caliberId: 5,
         name: '10-12'
     },
  }, ...]

Names are often large and if the list contains 5000 pallets that is a lot of bytes in names.

When the client app calls api/pallets/list it has previously already downloaded the list of growers, species and calibers, by calling api/growers/list, api/species/list, and api/calibers/list

Because of that, I'm wondering if it is a good idea that the server returns only the ids of things, ie:

 [{
     palletId: 333,
     code: 'J050000081',
     grower: {
         growerId: 35,
     },
     species: {
         speciesId: 1,
     },
     caliber: {
         caliberId: 5,
     },
  }, ...]

And then the client app will have the responsibility of completing the JSON, by doing something like this:

 // Pseudocode

 // Just after fetching from api/pallets/list
 foreach pallet in clientApp.pallets {
     pallet.grower = clientApp.growers[pallet.growerId]
     pallet.species = clientApp.species[pallet.speciesId]
     pallet.caliber = clientApp.calibers[pallet.caliberId]
 }

 // growers is a dictionary with all the growers already downloaded from the server
 // species is a dictionary with all the species already downloaded from the server
 // calibers idem

I want to know if this is a good or bad idea for improving performance.
Is there a name for this practice?

The code would be much more cleaner without a change like this but this 5000 pallets jarray is too heavy. In the example I'm only putting 3 fields (grower, species, caliber) but in reality there are like 10. All of them have id + name + other subfields…

Best Answer

I want to know if this is a good or bad idea for improving performance. Is there a name for this practice?

Depends on your requirements and the issues of performance to fix. Make yourself the next question: Do I really have an issue with the response size?

If the data changes the client remains unaware of the changes. So you have two options here:

  • Periodic synchronisations
  • Reload the data stored locally and iterate all over the 5k rows to retrieve the nested objects.

But, if you have to reload all the data, where are the savings?

Unless you are concerned about real bandwidth or data plan constraints, I would not care prematurely about the size of the response. Instead, I would enhance the API RESt itself

Pagination

/api/pallets/list?page=0&pageSize=500

Dynamic representations

/api/pallets/list?fields=id,name,growers.name,species.name

We will find battle-tested solutions such as GrapqQL or OData to this end.

Mix up

/api/pallets/list?fields=id;name;growers.name;species.name&page=0&pageSize=500

Etag

We could enhance the solution with ETag.

If you can't afford the pagination, the dynamic representations may help. ETag is just a plus in any scenario.

All of the above approaches improve client-side performance but the server suffers a load increase 1. However, it's cheaper and easier to scale up|out the server than the client.


1: ETag is addressed to save bandwidth not to reduce the calls to the server.