Nginx proxy response time is very slow

nginxtcpvarnish

Recently we added nginx in front of varnish for ssl offloading. We were about to rewrite all our http requests to https. But then we found out that there was a significant increase in response time even for the http calls when it is served by nginx. Whereas the same request served by varnish without nginx the response time was lot faster.

I tweaked proxy buffers (2048 4k) so the response is not stored in a file and also turned off proxy buffering. But both approach didn't help. So I cloned the nginx server (virtual machine) and issued same request against the cloned one. The response time was on par with varnish.

So it seems that when there's load on nginx (about 700 requests/sec) then the response time seems to increase.

Can someone tell me am I missing something that's obvious?

Here's my nginx config

#nginx.conf
worker_processes auto;

worker_rlimit_nofile 90000;

pid        /var/run/nginx.pid;

error_log /var/log/nginx/error.log error;

events {

    worker_connections 40000;

    multi_accept on;

    use epoll;
}


http {

    ##
    # Basic Settings
    ##

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    client_max_body_size 20M;
    client_body_buffer_size 128k;
    server_tokens off;
    keepalive_requests 1000;
    reset_timedout_connection on;

    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    ##
    # SSL common settings
    ##
    include /etc/nginx/include.d/ssl-common;


    ##
    # Logging Settings
    ##

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                  '$status $body_bytes_sent "$http_referer" '
                  '"$http_user_agent" "$http_x_forwarded_for"';

    log_format detailed '$remote_addr - $remote_user [$time_local] '
                '"$request" $status $body_bytes_sent "$http_referer" '
                '"$http_user_agent" $request_length $request_time '
                '$upstream_response_length $upstream_response_time '
                '$upstream_status';

    log_format upstreamlog '[$time_local] $remote_addr - $remote_user - $server_name to: $upstream_addr: $status / upstream $upstream_status $request upstream_response_time $upstream_response_time msec $msec request_time $request_time body: $request_body';

    log_format timed_combined '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'$upstream_connect_time $upstream_header_time '
'$request_time $upstream_response_time $pipe';

    access_log off;


    ##
    # Gzip Settings
    ##

    gzip on;
    gzip_disable "msie6";

    #Proxy config

    proxy_buffering on;
    proxy_buffers 56 4k;
    proxy_busy_buffers_size 8k;

    proxy_set_header Host $host;
    proxy_http_version 1.1;
    proxy_set_header Connection ""; 
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    ##
    # Virtual Host Configs
    ##

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}


#For a specific request I've increased the proxy buffer size
proxy_buffers 2048 4k;
proxy_buffer_size 4k;
proxy_busy_buffers_size 8k;


#Upstream setting
keepalive 2000;

I've even optimized tcp settings in sysctl.config and that doesn't help either. Here's my sysctl.config

#sysctl.config

fs.file-max = 100000
net.ipv4.ip_local_port_range = 1024 65000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_max_tw_buckets = 400000
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_max_syn_backlog = 65536
net.core.somaxconn = 16384
net.core.netdev_max_backlog = 16384
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
vm.min_free_kbytes = 65536

Here's the graph for Disk IO.
Note: Tcp connection stats I added them only recently, so there's not much information.

Nginx_status – graph showing requests and connections

Best Answer

Anything in the error log?

Recently had an issue where the system was running out of file handles because it was buffering every request and upstream keepalives weren't enabled. I wouldn't have thought you'd have these problems from those settings though.

Don't know what your hardware looks like but 700 SSL req/s a second is pretty heavy, is there any iowait? cpu maxed out? "keepalive_timeout 65" looks high for that level of traffic too, you could be running out of tcp connections. I've found best results for ~300 req/s to be very short, around 1-3 seconds, but that will depend on your load.