NGINX HTTPS Reverse Proxy – Fast TTFB but low concurrency

httpsnginxsslvarnish

I've got an application that runs:
NGINX (SSL) => VARNISH (CACHE) => APACHE/PHP.

Running ab benchmark, I'm able to achieve 30k+ requests / second on varnish layer (via HTTP) via a EC2 t2.small instance. However, when I run the test through NGINX (HTTPS) I'm only able to push 160 requests / second (average of 43ms for TTFB from the public web).

@ nginx.conf

user  nginx;
worker_processes  auto;

worker_rlimit_nofile 65535;

error_log  /var/log/nginx/error.log;

pid        /var/run/nginx.pid;


events {
    worker_connections  16024;
        multi_accept on;
}

and at the http level:

sendfile        on;
tcp_nopush     on;

keepalive_timeout  10;


ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;

@ domain.conf

server {
        listen 443 ssl;

        server_name xyz.com;
        ssl_certificate /home/st/ssl3/xyz.crt;
        ssl_certificate_key /home/xyz/ssl3/xyz.key;

        ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
        ssl_prefer_server_ciphers on;
        ssl_ciphers ECDH+AESGCM:ECDH+AES256:ECDH+AES128:DH+3DES:!ADH:!AECDH:!MD5;

        ssl_session_tickets on;

        location / {

                proxy_buffers 8 8k;
                proxy_buffer_size 2k;


            proxy_pass http://127.0.0.1:79;
            proxy_set_header X-Real-IP  $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto https;
            proxy_set_header X-Forwarded-Port 443;
            proxy_set_header Host $host;

                proxy_redirect off;

        }

    add_header Strict-Transport-Security "max-age=63072000; includeSubdomains; preload";
}

Here is benchmark for Apache directly

INTERNAL => @APACHE:

Concurrency Level:      10
Time taken for tests:   0.694 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Non-2xx responses:      1002
Keep-Alive requests:    996
Total transferred:      705122 bytes
HTML transferred:       401802 bytes
Requests per second:    1440.93 [#/sec] (mean)
Time per request:       6.940 [ms] (mean)
Time per request:       0.694 [ms] (mean, across all concurrent requests)
Transfer rate: 992.22 [Kbytes/sec] received

Here is the benchmark for Varnish (it was running at 20-30k earlier – used up my CPU cycles, average ATM is 4-8k rps)

INTERNAL => @VARNISH => @APACHE:

Concurrency Level:      10
Time taken for tests:   0.232 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    0
Total transferred:      23439800 bytes
HTML transferred:       23039412 bytes
Requests per second:    4310.16 [#/sec] (mean)
Time per request:       2.320 [ms] (mean)
Time per request:       0.232 [ms] (mean, across all concurrent requests)
Transfer rate:          98661.39 [Kbytes/sec] received

Here is the benchmark for NGINX via HTTP

INTERNAL => @NGINX[HTTP] => @VARNISH => @APACHE:

Concurrency Level:      10
Time taken for tests:   0.082 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Non-2xx responses:      1001
Keep-Alive requests:    1000
Total transferred:      382382 bytes
HTML transferred:       184184 bytes
Requests per second:    12137.98 [#/sec] (mean)
Time per request:       0.824 [ms] (mean)
Time per request:       0.082 [ms] (mean, across all concurrent requests)
Transfer rate:          4532.57 [Kbytes/sec] received

Here is the benchmark for NGINX via HTTPS

INTERNAL => @NGINX[HTTPS=>HTTP] => @VARNISH => @APACHE:

Concurrency Level:      10
Time taken for tests:   7.029 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Non-2xx responses:      1000
Keep-Alive requests:    0
Total transferred:      663000 bytes
HTML transferred:       401000 bytes
Requests per second:    142.27 [#/sec] (mean)
Time per request:       70.288 [ms] (mean)
Time per request:       7.029 [ms] (mean, across all concurrent requests)
Transfer rate:          92.12 [Kbytes/sec] received

Best Answer

Well from the information you have (and have not) provided, I can only guess. But judging from the instance type (t2 has burstable ticket based performance and when out of tickets, gets about 20% of a core; it's not a good instance to do benchmarks on) and the use of ab for testing (btw. when you write it as 'AB testing', the first thing that comes to mind is this) I'd say your performance is pretty much as expected.

When starting SSL, or TLS session, the most performance intensive task is not the data encryption/decryption, but the key exchange. As ab does not use SSL session caching, the key exchange has to be done on every connection.

Depending on the cipher/kex/auth suite actually used (can't tell, no ab output provided), that may be quite a lot of work for the CPU. And since both ends are on the same machine, you double the CPU requirements per connection (it's a simplification, but good enough here).

In real world use, keep alives might help you get better performance (depends on the client, normal browsers use it; try ab -k). And you will get better performance from the SSL session caching you mentioned (again depends on the client, normal browsers support it).

There are several other ways that will help you improve your performance. Of course you can get better hardware. You can optimize your key sizes (depends on the level of protection required for the app) - smaller keys are usually cheaper to work with. Testing from different machine might, or might not improve the apparent performance. And getting different OpenSSL build, or different SSL library altogether, could as well provide better performance.

Just for a reference, you can take a look at this paper by Intel. They do compare performance on a highly optimized machine (and some optimized software). Consider you have less than 1/30 of their computing power available (could be as low as 1/150 if you are out of tickets).

Though, if you need high-performance SSL, it might be worth considering using Amazon ELB to do the SSL termination for you, since you are on EC2 already.

Edit: For example Apache JMeter uses ssl context caching. httperf does as well. I find especially JMeter good at simulating real-life-like loads. But for this httperf way of session caching could work best.

Not seeing any difference with -k may be because it's still not used. Depends on concurrency settings and (at least on my machine) it seems to depend on the url as well. It does not use keepalives if I use domain name which points to more than one IP in the url (don't ask me why).

Depending on your perception of massive, but I would not expect to get more than about 500 connections per second in bursts on this rather small instance and not more than 250 cps sustained.

Comparing varnish plaintext http to nginx ssl is comparing pears to apples. Or rather comparing blueberries to watermelons in terms of hardware requirements.

Again for your reference (notice the Keep-Alive requests: 100 line).

Without -k

Concurrency Level:      1
Time taken for tests:   0.431 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      399300 bytes
HTML transferred:       381200 bytes
Requests per second:    232.26 [#/sec] (mean)
Time per request:       4.305 [ms] (mean)
Time per request:       4.305 [ms] (mean, across all concurrent requests)
Transfer rate:          905.69 [Kbytes/sec] received

With -k

Concurrency Level:      1
Time taken for tests:   0.131 seconds
Complete requests:      100
Failed requests:        0
Keep-Alive requests:    100
Total transferred:      402892 bytes
HTML transferred:       381200 bytes
Requests per second:    762.11 [#/sec] (mean)
Time per request:       1.312 [ms] (mean)
Time per request:       1.312 [ms] (mean, across all concurrent requests)
Transfer rate:          2998.53 [Kbytes/sec] received

Edit2: Well, you need to understand, that serving content directly from memory (that's what Varnish is doing) is as easy as it can get. You parse the headers, you find the content in memory, you spit it out. And Varnish excels at this.

Establishing encrypted connection is a completely different level. So once you add nginx, it has to do the SSL handshake (key exchange, authentication) and encryption, which require much more resources. Then it parses the headers. Then it has to create another TCP connection to Varnish.

Again, in the aforementioned Intel paper, they have 28 cores, and done some tweaking to their OpenSSL, to do 38k HTTPS cps (a little more than your Varnish performance). You have about 1/5 of a core, and are affected by your virtual neighbours.

Quoting Amazon EC2 instance list:

For example, a t2.small instance receives credits continuously at a rate of 12 CPU Credits per hour. This capability provides baseline performance equivalent to 20% of a CPU core.

And yet another paper from nginx themselves:

Summary of Results A single virtualized Intel core can typically perform up to 350 full 2048-bit SSL handshake operations per second, using modern cryptographic ciphers. This equates to several hundred new users of your service per second per core.