I have a server with an nginx server (1.10.2), since we migrated a few dozens websites on it it keeps "crashing". It's still up, but it just doesn't answer any requests or log anything until a reload or a restart is done.
Looking at the status page just after a reload :
Active connections: 9432
server accepts handled requests
550310 550310 657656
Reading: 0 Writing: 3280 Waiting: 6150
That seems ridiculous, and looking at the access logs there aren't enough queries to justify that. Wondering if it might be something like a slowris attack, I tried lowering client_body_timeout and client_header_timeout to 5s, it looks like it's climbing slower now but who knows.
Any ideas what I could do to prevent that nginx from dying every few hours ?
I already disabled keepalive, just in case, didn't change a thing.
EDIT : The nginx.conf :
pid /var/run/nginx.pid;
user www-data;
worker_processes 12;
worker_rlimit_nofile 300000;
error_log /var/log/nginx/error.log;
events
{
multi_accept on;
use epoll;
worker_connections 2048;
}
http
{
index index.html index.htm index.php;
server_tokens off;
include /etc/nginx/mime.types;
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
client_max_body_size 100M;
keepalive_timeout 0;
gzip on;
gzip_http_version 1.1;
gzip_vary on;
gzip_comp_level 6;
gzip_proxied any;
gzip_buffers 16 8k;
gzip_disable "MSIE [1-6]\.(?!.*SV1)";
gzip_types
text/plain
text/css
text/js
text/xml
text/javascript
application/javascript
application/x-javascript
application/json
application/xml
application/rss+xml
image/svg+xml
font/opentype
image/gif
image/jpeg
image/png
image/bmp
image/x-icon;
}
And one of the sites :
server
{
listen 80;
root /home/sitename/www;
server_name sitename.com;
access_log /var/log/nginx/sitename_access.log;
error_log /var/log/nginx/sitename_error.log;
pagespeed on;
pagespeed FileCachePath "/tmp/pagespeed/sitename";
include "pagespeed.conf";
if ($host != 'www.sitename.com')
{
return 301 $scheme://www.sitename.com$request_uri;
}
location /
{
index index.php index.html index.htm;
try_files $uri $uri/ /index.php?$args;
}
location ~ \.php$
{
fastcgi_pass fastcgi_sitename;
fastcgi_index index.php;
include fastcgi.conf;
}
}
pagespeed.conf is only a bunch of filters. Some sites have a few other location blocs, some have https, but nothing I haven't done on hundreds of other productions servers without any problems.
Best Answer
You probably hit max connections limit (worker_processes * worker_connections). When you reload nginx, it spawns new processes and let old ones finish requests and die. Therefore your connections limit gets reset.
Try to check netstat where the connections come from:
You can then try to set client timeouts:http://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_timeouthttp://nginx.org/en/docs/http/ngx_http_core_module.html#client_header_timeoutLooking at your server status, I can see 2 important things:
So you have 3280 clients receiving response from your webserver and 6150 clients doing nothing - perhaps keepalive?
Access log is filled only when a request finishes. If you have 3000 clients waiting for response, you will not see them in the access log until they receive whole response. However you can workaround this problem by writing a small lua script, which you call via access_by_lua.
You can also tune your access log and log processing time of your application via $request_time. Please check documentation for more variables: http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format