Microservices Architecture – Large File and Data Transfer Best Practices

designmicroservicesrestxml

My company is currently working on adopting a microservice architecture but we are encountering some growing pains (shock!) along the way. One of the key contention points we are facing is how to communicate large quantities of data between our different services.

As a bit of background we have a document store that serves as a repository for any document we might need to handle across the company. Interacting with said store is done via a service which provides a client with a unique ID and a location to stream the document. The document's location can later be accessed via a lookup with the provided ID.

The problem is this – Does it make sense for all our microservices to be accepting this unique ID as part of their API for the purposes of interacting with documents or not? To me this feels inherently wrong – the services are no longer independent and rely upon the document store's service. While I do acknowledge this might simplify API design and perhaps even have some performance gains the resulting coupling more than counterbalances the benefits.

Does anyone know how the rainbow unicorns (Netflix, Amazon, Google, etc.) handle large files / data exchange between their services?

Best Answer

Does anyone know how the rainbow unicorns (Netflix, Amazon, Google, etc.) handle large files / data exchange between their services?

Unfortunately I do not know how they deal with such problems.

The problem is this - Does it make sense for all our microservices to be accepting this unique ID as part of their API for the purposes of interacting with documents or not?

It violates the Single Responsibility Principle, which should be inherently in your microservice's architecture. One microservice - logically one, physically many instances representing one - should be dealing with one topic.

In the case of your document store, you have one point, where all queries for documents go (of course you could split this logical unit up into multiple document stores for several kinds of documents).

  • If your "application" needs to work on a document, it asks the respective microservice and processes its result(s).

  • If another service needs an actual document or parts of it, it has to ask the document service.

One of the key contention points we are facing is how to communicate large quantities of data between our different services.

This is an architectural problem:

  1. Decrease the need to transfer big amounts of data

    Ideally, each service has all of it's data and needs no transfer to simply serve requests. As an extension of this idea - if you have the need to transfer data, think of redundancy (*in a positive way_): Does it make sense to have the data redundant in many places (where they are needed)? Think of how possible inconsistencies might harm your processes. There is no transfer faster as actually none.

  2. Decrease the size of the data itself

    Think of how you could compress your data: Starting with actual compression algortihms up to smart data structures. The less goes over the wire, the faster you are.

Related Topic