Architecture – How to Avoid Network Bottlenecks in Microservice Systems

Architecturemicroservicesserver

I've been reading a lot about microservice architectures for server applications, and have been wondering how the internal network usage is not a bottleneck or a significant disadvantage compared to a monolith architecture.

For the sake of precision, here are my interpretations of the two terms:

  1. Monolith architecture: One application in a single language that handles all of the functionality, data, etc. A load balancer distributes requests from the end user across multiple machines, each running one instance of our application.

  2. Microservice architecture: Many applications (microservices) handling a small portion of the functionality and data. Each microservice exposes a common API that is accessed through the network (as opposed to inter-process communication or shared memory on the same machine). API calls are stiched mostly on the server to produce a page, although perhaps some of this work is done by the client querying individual microservices.

To my naive imagination, it seems like a microservices architecture uses slow network traffic as opposed to faster resources on the same machine (the memory and the disk). How does one ensure that API querying through the internal network will not slow down the overall response time?

Best Answer

Internal networks often use 1 Gbps connections, or faster. Optical fiber connections or bonding allow much higher bandwidths between the servers. Now imagine the average size of a JSON response from an API. How much of such responses can be transmitted over a 1 Gbps connection in one second?

Let's actually do the math. 1 Gbps is 131 072 KB per second. If an average JSON response is 5 KB (which is quite a lot!), you can send 26 214 responses per second through the wire with just with one pair of machines. Not so bad, isn't it?

This is why network connection is usually not the bottleneck.

Another aspect of microservices is that you can scale easily. Imagine two servers, one hosting the API, another one consuming it. If ever the connection becomes the bottleneck, just add two other servers and you can double the performance.

This is when our earlier 26 214 responses per second becomes too small for the scale of the app. You add other nine pairs, and you are now able to serve 262 140 responses.

But let's get back to our pair of servers and do some comparisons.

  • If an average non-cached query to a database takes 10 ms., you're limited to 100 queries per second. 100 queries. 26 214 responses. Achieving the speed of 26 214 responses per second requires a great amount of caching and optimization (if the response actually needs to do something useful, like querying a database; "Hello World"-style responses don't qualify).

  • On my computer, right now, DOMContentLoaded for Google's home page happened 394 ms. after the request was sent. That's less than 3 requests per second. For Programmers.SE home page, it happened 603 ms. after the request was sent. That's not even 2 requests per second. By the way, I have a 100 Mbps internet connection and a fast computer: many users will wait longer.

    If the bottleneck is the network speed between the servers, those two sites could literally do thousands of calls to different APIs while serving the page.

Those two cases show that network probably won't be your bottleneck in theory (in practice, you should do the actual benchmarks and profiling to determine the exact location of the bottleneck of your particular system hosted on a particular hardware). The time spent doing the actual work (would it be SQL queries, compression, whatever) and sending the result to the end user is much more important.

Think about databases

Usually, databases are hosted separately from the web application using them. This can raise a concern: what about the connection speed between the server hosting the application and the server hosting the database?

It appears that there are cases where indeed, the connection speed becomes problematic, that is when you store huge amounts of data which don't need to be processed by the database itself and should be available right now (that is large binary files). But such situations are rare: in most cases, the transfer speed is not that big compared to the speed of processing the query itself.

When the transfer speed actually matters is when a company is hosting large data sets on a NAS, and the NAS is accessed by multiple clients at the same time. This is where a SAN can be a solution. This being said, this is not the only solution. Cat 6 cables can support speeds up to 10 Gbps; bonding can also be used to increase the speed without changing the cables or network adapters. Other solutions exist, involving data replication across multiple NAS.

Forget about speed; think about scalability

An important point of a web app is to be able to scale. While the actual performances matter (because nobody wants to pay for more powerful servers), scalability is much more important, because it let you to throw additional hardware when needed.

  • If you have a not particularly fast app, you'll lose money because you will need more powerful servers.

  • If you have a fast app which can't scale, you'll lose customers because you won't be able to respond to an increasing demand.

In the same way, virtual machines were a decade ago perceived as a huge performance issue. Indeed, hosting an application on a server vs. hosting it on a virtual machine had an important performance impact. While the gap is much smaller today, it still exists.

Despite this performance loss, virtual environments became very popular because of the flexibility they give.

As with the network speed, you may find that VM is the actual bottleneck and given your actual scale, you will save billions of dollars by hosting your app directly, without the VMs. But this is not what happens for 99.9% of the apps: their bottleneck is somewhere else, and the drawback of a loss of a few microseconds because of the VM is easily compensated by the benefits of hardware abstraction and scalability.