Rest – Which is the best option to exchange large amount of data in a micro-service architecture

awsmicroservicesrest

Application I am working on requires text contents to be extracted from various proprietary document formats like Microsoft word documents (doc, ppt, xls ), pdf and etc.

I am planning to implement a micro-service which takes document in proprietary format as input and returns extracted text as output.

This solution requires micro-service to exchange large amount of data per request (of the order 1 MB to 100 MB). Expectation is that microserive should be able to scale to 1000 requests per second.

W.r.t to this solution want to understand

  • Is it OK to transfer data at this rate over micro-service architecture?
  • Planning to use rest API's to transfer data. Is it a good option ?

Best Answer

There are some important aspects you should consider first.

Streaming

Let's imagine the 100 MB file is received by the service A which transfers it to service B, which, in turn, uses service C to do the actual parsing of the proprietary format.

The wrong approach would be for the services A and B to start sending the file to the underlying service only after they completely received the file from the client:

enter image description here

Instead, as soon as they start receiving the file, they should stream it to the underlying service.

enter image description here

This means that you're not waiting the time it takes to transfer 100 MB three times, but only one time, plus the latency...

Latency

Latency, on the other hand, cannot be avoided. Every intermediary service would still have to open the HTTP/HTTPS connection to the underlying service, before starting to transfer the file.

If your micro-services are located in the same data center, chances are the latency is a matter of a few milliseconds. If the services are hosted in different data centers, the latency may grow. With a high number of intermediaries, this can become a problem, and it will affect even small requests.

Possible DOS

When using the streaming technique, you should check that you don't open yourself to a possible DOS attack. The risk is that the intermediaries will keep the HTTP connection as long as the client is sending the file. The DOS attack would then consist of sending lots of files at a very low speed in order to exhaust the connections that the services are able to process.

Related Topic