Microservice Architecture for Composite Data Entities

Architecturemicroservices

We have a high read, low write website which currently has monolithic architecture on back end.

We have recently started breaking it into micro-services.
We have designed some micro-services – Product, Price, Offers, Content(images, videos, etc…).

There is another micro-service – "Recommend". This service runs a logic and provides similar product ids for a given product id.

On our product page we want to show Product details, prices, offers and a image. This is very similar to Amazon product page.
This part looks fairly simple, as we call all micro-services in parallel and compose the required entity.

Next section is of "Similar products" which shows basic details of multiple similar products. This section is similar to Amazon's "Sponsored products" section.

enter image description here

Now as per our current approach, we first invoke "Recommend" service which returns list of product ids. Then we send these ids to all other micro-services to fetch their product details, prices, offers & image.

This approach works fine but there are some points of discussion to it:

  1. Some micro-service calls are dependent on others. We have to wait for first call to complete before we can start with others. As much I've read, people say that all micro-service calls should be as much independent as possible.

  2. Is it even the correct approach? Could there be some better approach such as:

    i. Another service that performs this composition and caches composed data. But now this new service will have to listen to changes from all micro services and invalidate cache accordingly. This approach improves overall performance.

    ii. Sync some high traffic data like prices to Product micro-service but this increases complexity and makes micro-services somewhat dependent on each other which is against the principles of micro-service.

Would like to know what others are doing in such cases.
I checked on StackExchange but could not find something like my problem.
Something similar – How to query data from multiple microservices

Best Answer

As with many issues in the field of software engineering, there are some tradeoffs to consider here. Yes, you do want microservices to be as independent as possible, but since they need to work as a single system, you can't avoid all dependencies. The tough point is deciding how much dependency is still OK while giving you the end result you need. Apart from very simple systems, you will always have some services which need to access data produces by others.

Two common approaches are:

  • service A calls service B to get its data
  • service A owns its own copy of the data which originates in service B

Hybrid approaches are also possible, such as using the former method for some kinds of data and the latter for others.

Main advantages and disadvantages to consider:

  • service A calls service B to get its data
    • pro
      • data is fetched at the latest possible moment, so it is as up-to-date as possible
      • simplicity: no data duplication and synchronization issues
    • con
      • performance: calls to service A become longer since it has to call service B during each call
      • reliability: if service B goes down, service A becomes unusable as well (and any performance issues propagate likewise)
  • service A owns its own copy of the data which originates in service B
    • pro
      • performance: data is available directly with service A's "own" data
      • reliability: if service B goes down or becomes slow, service A is not affected
    • con
      • complexity: you have to provide a mechanism of propagating data updates e.g. via events sent over a message bus
      • data duplication requires extra storage, network bandwidth for synchronization, etc.
      • there will be a lag between the data being updated in service B and this information being propagated to service A; whether this is a problem will depend on the context

In practice, for large systems, the approach using data duplication is preferable due to improved reliability and performance. Moving data transfers to asynchronous communication wherever possible can be very helpful, and is actually essential for a truly cloud-native environment, but comes at the costs outlined below.

One factor you should consider is that your recommendation service probably has to use some of the offers' data anyway: recommendations are probably made based on the products' properties, so you need to have access to these properties in the Recommendation service anyway. Then, you could return them in your responses as well. However, if you need to return many details which your service does not need for itself (e.g. picture URL or some details of the text formatting), you may be better off pulling such data from an external service in real time and returning only product IDs.

Related Topic