Nginx – Does this prove a network bandwidth bottleneck

apache-2.2nginxperformance

I've incorrectly assumed that my internal AB testing means my server can handle 1k concurrency @3k hits per second.

My theory at at the moment is that the network is the bottleneck. The server can't send enough data fast enough.

External testing from blitz.io at 1k concurrency shows my hits/s capping off at 180, with pages taking longer and longer to respond as the server is only able to return 180 per second.

enter image description here

I've served a blank file from nginx and benched it: it scales 1:1 with concurrency.

enter image description here

Now to rule out IO / memcached bottlenecks (nginx normally pulls from memcached), I serve up a static version of the cached page from the filesystem.

enter image description here

The results are very similar to my original test; I'm capped at around 180 RPS.

Splitting the HTML page in half gives me double the RPS, so it's definitely limited by the size of the page.

enter image description here

If I internally ApacheBench from the local server, I get consistent results of around 4k RPS on both the Full Page and the Half Page, at high transfer rates.
Transfer rate: 62586.14 [Kbytes/sec] received

If I AB from an external server, I get around 180RPS – same as the blitz.io results.

How do I know it's not intentional throttling?

If I benchmark from multiple external servers, all results become poor which leads me to believe the problem is in MY servers outbound traffic, not a download speed issue with my benchmarking servers / blitz.io.

So I'm back to my conclusion that my server can't send data fast enough.

Am I right? Are there other ways to interpret this data? Is the solution/optimization to set up multiple servers + load balancing that can each serve 180 hits per second?

I'm quite new to server optimization, so I'd appreciate any confirmation interpreting this data.


Outbound traffic

Here's more information about the outbound bandwidth: The network graph shows a maximum output of 16 Mb/s: 16 megabits per second. Doesn't sound like much at all.

Due to a suggestion about throttling, I looked into this and found that linode has a 50mbps cap (which I'm not even close to hitting, apparently). I had it raised to 100mbps.

Since linode caps my traffic, and I'm not even hitting it, does this mean that my server should indeed be capable of outputting up to 100mbps but is limited by some other internal bottleneck? I just don't understand how networks at this large of a scale work; can they literally send data as fast as they can read from the HDD? Is the network pipe that big?

enter image description here


In conclusion

1: Based on the above, I'm thinking I can definitely raise my 180RPS by adding an nginx load balancer on top of a multi nginx server setup at exactly 180RPS per server behind the LB.

2: If linode has a 50/100mbit limit that I'm not hitting at all, there must be something I can do to hit that limit with my single server setup. If I can read / transmit data fast enough locally, and linode even bothers to have a 50mbit/100mbit cap, there must be an internal bottleneck that's not allowing me to hit those caps that I'm not sure how to detect. Correct?

I realize the question is huge and vague now, but I'm not sure how to condense it. Any input is appreciated on any conclusion I've made.

Best Answer

The issue was me assuming that the linode.com graph peaks were true peaks. It turns out the graph uses 5 minute average data points, thus my peak appeared to be 24mbits when in fact I was hitting the 50mbit cap.

Now that they have raised it to 100mbits, my benchmarks immediately went up to the new outbound traffic limit.

If only I had noticed that earlier! A lot of my reasoning hinged on the idea that I was not hitting an outbound traffic limit due to that graph.

Now, I peak at 370 requests per second, which is right under 100mbps at which point I start getting a "backlog" of requests, and response times start going up.

enter image description here

I can now increase maximum concurrency by shrinking the page; with gzip enabled I get 600RPS.

enter image description here

I do still run into problems when I suddenly peak and the backlog of pending requests (limited by bandwidth) start to accumulate, but that sounds like a different question.

enter image description here

It's been a great lesson in optimization / reading this data / narrowing down the possible problems. Thank you so much for your input!