My nginx keep crashing and reporting "bad gateway" errors in the browser. Nginx and PHP-FPM don't come preconfigured to handle large traffic loads. I had to put a systemctl restart php7.0-fpm
cron job in place each hour just to make sure my sites don't stay down for too long when they go. Let's just get down to it.
Some errors I get from /var/log/php7.0-fpm.log
:
[20-Sep-2017 12:08:21] NOTICE: [pool web3] child 3495 started
[20-Sep-2017 12:08:21] NOTICE: [pool web3] child 2642 exited with code 0 after 499.814492 seconds from start
[20-Sep-2017 12:32:28] WARNING: [pool web3] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 7 idle, and 57 total children
Nothing jumps out at me in the nginx log. If I leave it running for too long without restarting it (PHP-FPM), I will get gateway errors. I've tried following tutorials 3 times now tweaking settings but it's still no good. Right now I've got all kinds of settings probably way off but it never works either way I do it.
/etc/nginx/nginx.conf
:
user www-data;
worker_processes auto;
pid /run/nginx.pid;
worker_rlimit_nofile 100000;
events {
worker_connections 4096;
use epoll;
multi_accept on;
}
http {
sendfile on;
reset_timedout_connection on;
client_body_timeout 10;
send_timeout 2;
keepalive_timeout 30;
keepalive_requests 100000;
tcp_nopush on;
tcp_nodelay on;
types_hash_max_size 2048;
fastcgi_read_timeout 300000;
client_max_body_size 9000m;
include /etc/nginx/mime.types;
default_type application/octet-stream;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
ssl_prefer_server_ciphers on;
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
gzip on;
gzip_disable "msie6";
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_buffers 16 8k;
gzip_http_version 1.1;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
open_file_cache max=200000 inactive=20s;
open_file_cache_valid 30s;
open_file_cache_min_uses 2;
open_file_cache_errors on;
access_log off;
}
/etc/php/7.0/fpm/php-fpm.conf
:
[www]
pm = dynamic
pm.max_spare_servers = 200
pm.min_spare_servers = 100
pm.start_servers = 100
pm.max_children = 300
[global]
pid = /run/php/php7.0-fpm.pid
error_log = /var/log/php7.0-fpm.log
include=/etc/php/7.0/fpm/pool.d/*.conf
/etc/php/7.0/fpm/pool.d/www.conf
:
[www]
user = www-data
group = www-data
listen = /run/php/php7.0-fpm.sock
listen.owner = www-data
listen.group = www-data
pm = dynamic
pm.max_children = 300
pm.start_servers = 100
pm.min_spare_servers = 100
pm.max_spare_servers = 200
pm.max_requests = 500
One of my sites (/etc/php/7.0/fpm/pool.d/web3.conf
):
[web3]
listen = /var/lib/php7.0-fpm/web3.sock
listen.owner = web3
listen.group = www-data
listen.mode = 0660
user = web3
group = client1
pm = dynamic
pm.max_children = 141
pm.start_servers = 20
pm.min_spare_servers = 20
pm.max_spare_servers = 35
pm.max_requests = 500
chdir = /
env[HOSTNAME] = $HOSTNAME
env[TMP] = /var/www/clients/client1/web3/tmp
env[TMPDIR] = /var/www/clients/client1/web3/tmp
env[TEMP] = /var/www/clients/client1/web3/tmp
env[PATH] = /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Resource/proc usage from htop:
Best Answer
The issue is with your database access. You have several MySQL processes using CPU, which indicates that database queries take long to execute.
You need to look into your application, looking for the following things:
The slow database queries then cause PHP-FPM to run out of available child processes which process the client requests. This will cause
502 Bad Gateway
errors. You can try to increasepm.max_children
setting forweb3
pool, since that is causing the errors. This can remove scalability symptoms, but does not fix the root cause which is application / database inefficiency.If you are not using the
www
pool, you can remove it to save the resources it uses.The ideal setting for
pm.max_requests
is zero, that is, PHP workers should never be restarted. If your PHP workers don't leak memory due to bad coding of libraries, then you can use zero over there. Otherwise you can use whichever value that keeps the memory usage of the workers decent. There really isn't any other good advice to give regarding this setting.There isn't that much you can do with nginx settings here, since it is the PHP-FPM that is not available sometimes. You could change
gzip_comp_level
to1
, which makes nginx spend a little less CPU compressing output. But this has really small effect compared to application optimisation.