Nginx thousands of connections

nginx

I have a server with an nginx server (1.10.2), since we migrated a few dozens websites on it it keeps "crashing". It's still up, but it just doesn't answer any requests or log anything until a reload or a restart is done.

Looking at the status page just after a reload :

Active connections: 9432 
server accepts handled requests
 550310 550310 657656 
Reading: 0 Writing: 3280 Waiting: 6150 

That seems ridiculous, and looking at the access logs there aren't enough queries to justify that. Wondering if it might be something like a slowris attack, I tried lowering client_body_timeout and client_header_timeout to 5s, it looks like it's climbing slower now but who knows.
Any ideas what I could do to prevent that nginx from dying every few hours ?

I already disabled keepalive, just in case, didn't change a thing.

EDIT : The nginx.conf :

pid /var/run/nginx.pid;
user www-data;
worker_processes 12;
worker_rlimit_nofile 300000;
error_log /var/log/nginx/error.log;
events
{
    multi_accept on;
    use epoll;
    worker_connections 2048;
}

http
{
    index index.html index.htm index.php;
    server_tokens off;
    include /etc/nginx/mime.types;
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;

    client_max_body_size 100M;
    keepalive_timeout 0;

    gzip on;
    gzip_http_version 1.1;
    gzip_vary on;
    gzip_comp_level 6;
    gzip_proxied any;
    gzip_buffers 16 8k;
    gzip_disable "MSIE [1-6]\.(?!.*SV1)";
    gzip_types
        text/plain
        text/css
        text/js
        text/xml
        text/javascript
        application/javascript
        application/x-javascript
        application/json
        application/xml
        application/rss+xml
        image/svg+xml
        font/opentype
        image/gif
        image/jpeg
        image/png      
        image/bmp
        image/x-icon;
}

And one of the sites :

server
{
    listen              80;

    root                /home/sitename/www;

    server_name             sitename.com;

    access_log              /var/log/nginx/sitename_access.log;
    error_log               /var/log/nginx/sitename_error.log;

    pagespeed                           on;
    pagespeed                           FileCachePath "/tmp/pagespeed/sitename";

    include                             "pagespeed.conf";

    if ($host != 'www.sitename.com')
    {
      return 301 $scheme://www.sitename.com$request_uri;
    }

    location /
    {
        index                           index.php index.html index.htm;
        try_files                       $uri $uri/ /index.php?$args;
    }

    location ~ \.php$
    {
        fastcgi_pass            fastcgi_sitename;
        fastcgi_index           index.php;
        include                 fastcgi.conf;
    }
}

pagespeed.conf is only a bunch of filters. Some sites have a few other location blocs, some have https, but nothing I haven't done on hundreds of other productions servers without any problems.

Best Answer

You probably hit max connections limit (worker_processes * worker_connections). When you reload nginx, it spawns new processes and let old ones finish requests and die. Therefore your connections limit gets reset.

Try to check netstat where the connections come from:

netstat -t -n -v | grep ESTABLISHED

You can then try to set client timeouts:

Looking at your server status, I can see 2 important things:

  • Writing: 3280 (The current number of connections where nginx is writing the response back to the client.)
  • Waiting: 6150 (The current number of idle client connections waiting for a request.)

So you have 3280 clients receiving response from your webserver and 6150 clients doing nothing - perhaps keepalive?

Access log is filled only when a request finishes. If you have 3000 clients waiting for response, you will not see them in the access log until they receive whole response. However you can workaround this problem by writing a small lua script, which you call via access_by_lua.

You can also tune your access log and log processing time of your application via $request_time. Please check documentation for more variables: http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format