Nginx – Server crash (504 gateway timeout) with 100 concurrent users, using nginx and php5-fpm

MySQLnginxphp-fpmtimeoutUbuntu

We have a VPS server which is dedicated to a single website. Day to day it seems to work fine (say 20-50 concurrent users) but as soon as we get up to around 90+ concurrent users, the server starts to crash / timeout. It will start to show nginx's 504 Gateway Time-out error.

We had some issues earlier in the year where it was taking about 7 seconds to load some data-heavy pages, which we managed to resolve 90% by optimising mysql queries and making use of myqsl cache. However it doesn't seem to be helping with this!

When I say data heavy, it is loading approx 5000 records from the DB, through the framework.

The server is running Ubuntu 15.10, with 4 CPU's and 4GB memory. Mysql is on its own server with 1GB memory. The mysql server doesn't seem to get past about 30% utilisation, even with 100 users.

Mysql is configured to have a 64mb query_cache_size and 6mb query_cache_limit

We have APC installed but doesn't seem to make much difference overall

This is our nginx.conf

user www-data;
worker_processes 4;
pid /run/nginx.pid;

events {
    worker_connections 1024;
    # multi_accept on;
}

http {

    ##
    # Basic Settings
    ##

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 15;
    types_hash_max_size 2048;
    # server_tokens off;

    # server_names_hash_bucket_size 64;
    # server_name_in_redirect off;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;


    client_body_buffer_size     32k;
    client_header_buffer_size   8k;
    large_client_header_buffers 8 64k;

    #client_body_buffer_size 10K;
    #client_header_buffer_size 1k;
    client_max_body_size 12m;
    #large_client_header_buffers 2 1k;


    fastcgi_cache_path /etc/nginx/cache levels=1:2 keys_zone=microcache:100m inactive=10m max_size=1024m;
    fastcgi_cache_key "$scheme$request_method$host$request_uri";


    ##
    # SSL Settings
    ##

    ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
    ssl_prefer_server_ciphers on;

    ##
    # Logging Settings
    ##

    #access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    ##
    # Gzip Settings
    ##

    gzip on;
    gzip_disable "msie6";
    gzip_comp_level 3;
    gzip_vary on;
    gzip_proxied any;
    gzip_buffers 16 8k;
    gzip_http_version 1.1;
    gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript application/javascript text/x-js;


    ##
    # Virtual Host Configs
    ##

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

This is the server block

server {
    listen 80 default;
    server_name www.website.com;

    root /var/www/website.com/httpdocs;
    index index.php index.html index.htm;

    location / {
            try_files $uri @handler;
    }

    error_page 404 /assets/error-404.html;
    error_page 500 /assets/error-500.html;


    location @handler {
            expires off;

            include fastcgi_params;
            fastcgi_pass unix:/var/run/php5-fpm.sock;

            # fastcgi caching

            #Cache everything by default
            set $no_cache 0;

            if ($request_method !~ ^(GET|HEAD)$) {
                set $no_cache "1";
            }

            #Don't cache the following URLs
            if ($request_uri ~* "/(admin/|member/)")
            {
                    set $no_cache 1;
            }

            #fastcgi_no_cache $no_cache;
            #fastcgi_cache_bypass $no_cache;
            #fastcgi_cache microcache;
            #fastcgi_cache_key $scheme$host$request_uri$request_method;
            #fastcgi_cache_valid 200 301 302 10m;
            #fastcgi_cache_use_stale updating error timeout invalid_header http_500;
            #fastcgi_pass_header Set-Cookie;
            #fastcgi_pass_header Cookie;
            #fastcgi_ignore_headers Cache-Control Expires Set-Cookie;

            fastcgi_param SCRIPT_FILENAME $document_root/framework/main.php;
            fastcgi_param SCRIPT_NAME /framework/main.php;
            fastcgi_param QUERY_STRING url=$uri&$args;

            fastcgi_buffer_size 32k;
            fastcgi_buffers 4 32k;
            fastcgi_busy_buffers_size 64k;
    }

}

This is pool.d/www.conf details

pm = dynamic
pm.max_children = 30
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 4
pm.max_requests = 500

PHP is set to have 128mb memory, however each process is usually around ~70mb

I didn't manage to get a top while it was at 100 users, but this is the usual state:

             total       used       free     shared    buffers     cached
Mem:          3951       3793        157        114        273       2918
-/+ buffers/cache:        602       3348
Swap:            0          0          0

You'll see I did some experimenting with nginx's fastcgi_cache, which made a huge difference to performance (load time of 50 – 100ms) however the website has a lot of user functionality (uploads, modifying etc) which didn't work with it enabled.

I would like to re-look at fastcgi_cache but I feel that we must be able to get a better result on this current server without it?!

Been battling this one for a while now so any help would be great.

Best Answer

You have set up pm.max_children to 30, which means that there can be only 30 concurrent PHP scripts running at the same time.

When more users visit your sites, there aren't any free PHP processes to serve the request. nginx waits for some time, before returning the 504 Gateway Time-out error.

You seem to have plenty of free memory, as your cached column shows 2.9 GB of free memory.

You should check the average memory usage of your PHP processes with top command. The memory usage we are interested in is the RES column. Divide 2GB with that number, and you'll get a safe number for the pm.max_children setting.

You should also consider raising the value for pm.start_servers, pm.min_spare_servers and pm.max_spare_servers.

Spare servers are processes that are available to serve requests immediately. Otherwise the PHP process manager needs to launch a process separately, which takes some time.