Nginx – Configuring Nginx + PHP-FPM For High Traffic Load

nginxphp-fpm

My nginx keep crashing and reporting "bad gateway" errors in the browser. Nginx and PHP-FPM don't come preconfigured to handle large traffic loads. I had to put a systemctl restart php7.0-fpm cron job in place each hour just to make sure my sites don't stay down for too long when they go. Let's just get down to it.

Some errors I get from /var/log/php7.0-fpm.log:

[20-Sep-2017 12:08:21] NOTICE: [pool web3] child 3495 started
[20-Sep-2017 12:08:21] NOTICE: [pool web3] child 2642 exited with code 0 after 499.814492 seconds from start

[20-Sep-2017 12:32:28] WARNING: [pool web3] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 7 idle, and 57 total children

Nothing jumps out at me in the nginx log. If I leave it running for too long without restarting it (PHP-FPM), I will get gateway errors. I've tried following tutorials 3 times now tweaking settings but it's still no good. Right now I've got all kinds of settings probably way off but it never works either way I do it.

/etc/nginx/nginx.conf:

user www-data;
worker_processes auto;
pid /run/nginx.pid;

worker_rlimit_nofile 100000;

events {
        worker_connections 4096;
        use epoll;
        multi_accept on;
}


http {
        sendfile on;
        reset_timedout_connection on;
        client_body_timeout 10;
        send_timeout 2;
        keepalive_timeout 30;
        keepalive_requests 100000;
        tcp_nopush on;
        tcp_nodelay on;
        types_hash_max_size 2048;
        fastcgi_read_timeout 300000;
        client_max_body_size 9000m;
        include /etc/nginx/mime.types;
        default_type application/octet-stream;
        ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
        ssl_prefer_server_ciphers on;
        access_log /var/log/nginx/access.log;
        error_log /var/log/nginx/error.log;
        gzip on;
        gzip_disable "msie6";
        gzip_vary on;
        gzip_proxied any;
        gzip_comp_level 6;
        gzip_buffers 16 8k;
        gzip_http_version 1.1;
        gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

        include /etc/nginx/conf.d/*.conf;
        include /etc/nginx/sites-enabled/*;
        open_file_cache max=200000 inactive=20s;
        open_file_cache_valid 30s;
        open_file_cache_min_uses 2;
        open_file_cache_errors on;

        access_log off;
}

/etc/php/7.0/fpm/php-fpm.conf:

    [www]

    pm = dynamic
    pm.max_spare_servers = 200
    pm.min_spare_servers = 100
    pm.start_servers = 100
    pm.max_children = 300

    [global]
    pid = /run/php/php7.0-fpm.pid
    error_log = /var/log/php7.0-fpm.log
    include=/etc/php/7.0/fpm/pool.d/*.conf

/etc/php/7.0/fpm/pool.d/www.conf:

[www]

user = www-data
group = www-data
listen = /run/php/php7.0-fpm.sock
listen.owner = www-data
listen.group = www-data
pm = dynamic
pm.max_children = 300
pm.start_servers = 100
pm.min_spare_servers = 100
pm.max_spare_servers = 200
pm.max_requests = 500

One of my sites (/etc/php/7.0/fpm/pool.d/web3.conf):

[web3]

listen = /var/lib/php7.0-fpm/web3.sock
listen.owner = web3
listen.group = www-data
listen.mode = 0660

user = web3
group = client1

pm = dynamic
pm.max_children = 141
pm.start_servers = 20
pm.min_spare_servers = 20
pm.max_spare_servers = 35
pm.max_requests = 500

chdir = /

env[HOSTNAME] = $HOSTNAME
env[TMP] = /var/www/clients/client1/web3/tmp
env[TMPDIR] = /var/www/clients/client1/web3/tmp
env[TEMP] = /var/www/clients/client1/web3/tmp
env[PATH] = /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Resource/proc usage from htop:

enter image description here

Best Answer

The issue is with your database access. You have several MySQL processes using CPU, which indicates that database queries take long to execute.

You need to look into your application, looking for the following things:

  1. Database queries are properly optimised.
  2. Database design is efficient, and proper indexing is in place.
  3. Application has proper data caches in place.

The slow database queries then cause PHP-FPM to run out of available child processes which process the client requests. This will cause 502 Bad Gateway errors. You can try to increase pm.max_children setting for web3 pool, since that is causing the errors. This can remove scalability symptoms, but does not fix the root cause which is application / database inefficiency.

If you are not using the www pool, you can remove it to save the resources it uses.

The ideal setting for pm.max_requests is zero, that is, PHP workers should never be restarted. If your PHP workers don't leak memory due to bad coding of libraries, then you can use zero over there. Otherwise you can use whichever value that keeps the memory usage of the workers decent. There really isn't any other good advice to give regarding this setting.

There isn't that much you can do with nginx settings here, since it is the PHP-FPM that is not available sometimes. You could change gzip_comp_level to 1, which makes nginx spend a little less CPU compressing output. But this has really small effect compared to application optimisation.