Microservices Architecture – Large File and Data Transfer Best Practices

designmicroservicesrestxml

My company is currently working on adopting a microservice architecture but we are encountering some growing pains (shock!) along the way. One of the key contention points we are facing is how to communicate large quantities of data between our different services.

As a bit of background we have a document store that serves as a repository for any document we might need to handle across the company. Interacting with said store is done via a service which provides a client with a unique ID and a location to stream the document. The document's location can later be accessed via a lookup with the provided ID.

The problem is this – Does it make sense for all our microservices to be accepting this unique ID as part of their API for the purposes of interacting with documents or not? To me this feels inherently wrong – the services are no longer independent and rely upon the document store's service. While I do acknowledge this might simplify API design and perhaps even have some performance gains the resulting coupling more than counterbalances the benefits.

Does anyone know how the rainbow unicorns (Netflix, Amazon, Google, etc.) handle large files / data exchange between their services?

Best Answer

Does anyone know how the rainbow unicorns (Netflix, Amazon, Google, etc.) handle large files / data exchange between their services?

Unfortunately I do not know how they deal with such problems.

The problem is this - Does it make sense for all our microservices to be accepting this unique ID as part of their API for the purposes of interacting with documents or not?

It violates the Single Responsibility Principle, which should be inherently in your microservice's architecture. One microservice - logically one, physically many instances representing one - should be dealing with one topic.

In the case of your document store, you have one point, where all queries for documents go (of course you could split this logical unit up into multiple document stores for several kinds of documents).

If your "application" needs to work on a document, it asks the respective microservice and processes its result(s).
If another service needs an actual document or parts of it, it has to ask the document service.

One of the key contention points we are facing is how to communicate large quantities of data between our different services.

This is an architectural problem:

Decrease the need to transfer big amounts of data

Ideally, each service has all of it's data and needs no transfer to simply serve requests. As an extension of this idea - if you have the need to transfer data, think of redundancy (*in a positive way_): Does it make sense to have the data redundant in many places (where they are needed)? Think of how possible inconsistencies might harm your processes. There is no transfer faster as actually none.
Decrease the size of the data itself

Think of how you could compress your data: Starting with actual compression algortihms up to smart data structures. The less goes over the wire, the faster you are.

Related Solutions

Data Security Between Users in a Microservice Architecture

There is no need to duplicate data, instead each service needs to have the proper data context.

Let's say one has users and orders. Each order at minimum is going to be tagged with a user Id that placed the order. The order data context has no idea of the relationships of users or any addition information, only the User Id is needed.

The user service would know details about the user and the relations between other users.

So, one would have two services, user and order. The user service would have a method like:

GetMyUsers(UserId)

This would return the list of users that user can "see".

Then the order service has a method like:

GetOrders(List<UserIds>)

And Potentially...

GetMyOrders(UserId)

One could have a 3rd service that could orchestrate the events if necessary, although in this case it may not be necessary since it seems sequential.

Dynamic data aggregation in a microservice architecture

At our company we are using peekdata.io Data Gateway API and Report Builder that let’s us expose data from several our databases to end users and developers internaly. I think it solves your problem or at least, I would look for a similar solution. We can change and grow the data structures internally, while the users can use the data without changes to them. Have a look, maybe it is not something you are looking for after all.

Best Answer

Related Solutions

Data Security Between Users in a Microservice Architecture

Dynamic data aggregation in a microservice architecture

Related Topic