Nginx – Where is the bottleneck – nginx slow file serve


I have nginx 1.6.2, on Ubuntu 14+.

I host a couple of "big" files – from 20mb to 60mb – which are constantly accessed (it could be more then 1000 connections simultaneously at evening peaks). They are rarely changed, once in a week or two.

Server also hosts lots of user-uploaded images, which are also constantly accessed, but could be changed at users will. Images are uploaded via php.

So, the problem is – those big files are really slow to download. Sometimes it even drops connection. Here are some logs and configs:

Nginx config

 user www-data;
 worker_processes auto; 
 pid /run/;

 worker_rlimit_nofile 65000;

 events {
         worker_connections 4096;
         multi_accept on;
         use epoll; }

 http {
        client_body_buffer_size 50m;

        sendfile off;
        output_buffers 2 64k;
        tcp_nopush on;
        tcp_nodelay on;
        keepalive_timeout 120s;
        types_hash_max_size 2048;

        directio 10m;

        include /etc/nginx/mime.types;
        default_type application/octet-stream;

        ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
        ssl_prefer_server_ciphers on;

        access_log off; #/var/log/nginx/access.log;
        error_log /var/log/nginx/error.log crit;

        gzip on;
        gzip_disable "msie6";

        gzip_static on;
        gzip_comp_level 5;
        gzip_min_length 1024;
        gzip_proxied any;
        gzip_types text/plain application/xml application/x-javascript text/javascript text/css text$

        open_file_cache max=1000 inactive=5m;
        open_file_cache_valid 2m;
        open_file_cache_min_uses 1;
        open_file_cache_errors on;

        include /etc/nginx/conf.d/*.conf;
        include /etc/nginx/sites-enabled/*;

here is site conf

server {
        listen 80;
        server_name _;

        root /var/www/;

        location / {
                try_files $uri $uri/ =404;

        location ~* \.php$ {
                try_files $uri =404;
                fastcgi_pass unix:/var/run/php5-fpm.sock;
                fastcgi_index index.php;
                fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
                include fastcgi_params;

        location ~* .(unity3d|html)$ { #yep, that's the ones 20mb-50mb file types
        expires 7d;

        location /uploads/customimages {
        location ~\.php$ {return 403;}

        location /uploads/temp {
        location ~\.php$ {return 403;}

        location ~\.php$ {
        try_files $uri = 404;
        fastcgi_pass unix:/var/run/php5-fpm.sock;
        fastcgi_index index.php;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        include fastcgi_params;

Almost same for https protocol.

What I've checked: processor, memory, ssd i/o and network.

here is a slice of dstat, it's almost the same all the way.

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  3   0  96   0   0   0|2066k  728k|   0     0 |   0     0 |4251    11k
  3   0  96   1   0   0|6080k    0 | 219k 6722k|   0     0 |5633  6856
  2   0  96   1   0   0|8128k    0 | 255k 8089k|   0     0 |5988  6895
  3   0  96   1   0   0|5888k   72k| 232k 6811k|   0     0 |5818  7074
  2   0  97   1   0   0|6592k   48k| 231k 7228k|   0     0 |5745  7721
  3   1  95   1   0   0|7104k    0 | 243k 7903k|   0     0 |6024  7439
  3   1  96   1   0   0|6272k 1440k| 220k 6759k|   0     0 |6232  9463
  4   1  94   1   0   0|6784k    0 | 216k 6984k|   0     0 |5893  9412
  4   1  94   1   0   0|7040k    0 | 718k 7723k|   0     0 |6551    13k
  5   1  94   1   0   1|8000k  108k| 478k 8926k|   0     0 |6647    13k
  4   1  94   1   0   0|  10M  192k| 402k   10M|   0     0 |6665    14k
  4   1  95   1   0   0|  11M 2772k|1080k   12M|   0     0 |7188    12k
  4   1  94   1   0   0|9536k    0 |1012k   10M|   0     0 |6758    14k
  5   1  93   1   0   0|9472k  504k|1734k   10M|   0     0 |7298    13k
  3   1  95   1   0   0|9984k    0 |1027k   10M|   0     0 |6776    11k
  3   0  95   1   0   0|  10M    0 | 355k   11M|   0     0 |6550  8152
  3   1  95   1   0   0|  11M 2784k| 498k   14M|   0     0 |7403    10k
  8   2  89   1   0   0|6592k  228k| 311k 7482k|   0     0 |7394  9242
  3   0  96   1   0   0|5760k    0 | 211k 6076k|   0     0 |5807  8031
  5   1  94   1   0   0|6720k  180k| 201k 6378k|   0     0 |5949  9432
  3   0  96   1   0   0|6400k    0 | 234k 7113k|   0     0 |5866    11k
  4   1  95   1   0   0|5696k  672k| 231k 6988k|   0     0 |6128    10k
  5   1  94   1   0   0|6336k  108k| 189k 5477k|   0     0 |6002    12k
  8   1  90   1   0   0|5120k    0 | 181k 5832k|   0     0 |6445    18k
  4   1  94   1   0   0|6272k   84k| 212k 7521k|   0     0 |6656    21k
  6   1  92   1   0   0|7488k    0 | 237k 8334k|   0     0 |7109    23k
  5   1  94   1   0   0|8960k 3264k| 278k 9362k|   0     0 |7360    22k
  4   1  94   1   0   0|8896k   12k| 326k 9736k|   0     0 |7190    15k
  3   0  96   1   0   0|4864k  168k| 191k 5179k|   0     0 |6123    13k

Memory is checked by free:

             total       used       free     shared    buffers     cached
Mem:      32840320    9392976   23447344       6652     309472    4688688
-/+ buffers/cache:    4394816   28445504
Swap:      1569780          0    1569780

Network lshw -C network

       description: Ethernet interface
       product: 82579V Gigabit Network Connection
       vendor: Intel Corporation
       physical id: 19
       bus info: pci@0000:00:19.0
       logical name: eth0
       version: 05
       serial: e0:69:95:72:8e:25
       size: 1Gbit/s
       capacity: 1Gbit/s
       width: 32 bits
       clock: 33MHz
       capabilities: pm msi bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=e1000e driverversion=2.3.2-k duplex=full firmware=0.13-4 ip= latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s
       resources: irq:43 memory:fe500000-fe51ffff memory:fe524000-fe524fff ioport:f080(size=32)
  *-network:0 DISABLED
       description: Ethernet interface
       physical id: 3
       logical name: dummy0
       serial: 56:14:e3:9e:36:59
       capabilities: ethernet physical
       configuration: broadcast=yes
  *-network:1 DISABLED
       description: Ethernet interface
       physical id: 4
       logical name: bond0
       serial: ce:5f:11:d3:5b:01
       capabilities: ethernet physical
       configuration: autonegotiation=off broadcast=yes driver=bonding driverversion=3.7.1 firmware=2 link=no master=yes multicast=yes

I'm pretty sure the hardware is okay and the problem is somewhat with Ubuntu or nginx.

ulimit -n

Sorry for such a formatting.

Best Answer

This is surely related to directio.

You probably don't want to use this as it bypasses caching mechanisms (particularly if your big files don't change a lot) and is mostly synchronous.

From man 2 open :

O_DIRECT (since Linux 2.4.10)

              Try to minimize cache effects of the I/O to and from this
              file.  In general this will degrade performance, but it is
              useful in special situations, such as when applications do
              their own caching.  File I/O is done directly to/from user-
              space buffers.  The O_DIRECT flag on its own makes an effort
              to transfer data synchronously, but does not give the
              guarantees of the O_SYNC flag that data and necessary metadata
              are transferred.  To guarantee synchronous I/O, O_SYNC must be
              used in addition to O_DIRECT.

              A semantically similar (but deprecated) interface for block
              devices is described in raw(8).

If you think that too many big files are being served at the same time and that directio would be necessary since caching would be unefficient, then check for iowait and if such, then you should enable aio.

PS : Both aio and sendfile can be used together : sendfile for smaller files, aio for bigger ones.