Nginx – Identifying performance bottleneck on a high-end VPS (Apache 2.4 event_mpm/lighttpd/nginx)

apache-2.4lighttpdnginxopenvzvps

I have quite a high-end VPS from vpsblast (their SSD13) – 4 cores, 16gb RAM, 320gb of SSD hard drive space on a 1GigE Internet backbone (virtually uncontested). As near as I can tell it is running OpenVZ (simfs is used, user_beancounters exists). The database is on a different node in the same datacenter, and I'm running php-fpm, but this test concerns a static 9.49kb image (as php-fpm is flying and the app is extremely optimised). All requests are over https, so I've run http and https tests to determine whether SSL is the issue, but I'm not convinced that it's the issue. OS is Ubuntu 12.04 LTS. I've tested apache 2.4 (with the event_mpm), nginx and lighttpd, and I'm seeing very similar performance from all three, which leads me to believe it's not the httpd that's the issue. I'm currently using apache 2.4 for the purpose of these questions. My performance on the static object peaks at about 400rps (requests per second). That's like 3.7Mbps, way under the limit of the 1GigE line.

So first question: what performance should I be seeing on this sort of setup? In a discussion in #apache on FreeNode it was suggested that 10k concurrency shouldn't be impossible, and I should be able to serve 10k requests per second. Are those expectations unreasonable?

The next question is around identifying the performance bottleneck. I honestly don't have any idea where to start looking, as everything looks fine (I've included screenshots from atop below). I have made no sysctl tweaks, as they seem to mostly be controlled by the host OS. I have increased the soft and hard ulimits in /etc/security/limits.conf:

www-data hard nofile 1048576
www-data soft nofile 1048576
root hard nofile 1048576
root soft nofile 1048576

My apache httpd.conf is pretty standard for 2.4, but here are the changes I've made:

DocumentRoot "/var/www"
<Directory "/var/www">
    Options Indexes FollowSymLinks
    AllowOverride All
    Require all granted
</Directory>

<IfModule ssl_module>
SSLRandomSeed startup builtin
SSLRandomSeed connect builtin
</IfModule>

<IfModule setenvif_module>
BrowserMatch "MSIE 10.0;" bad_DNT
</IfModule>
<IfModule headers_module>
RequestHeader unset DNT env=bad_DNT
</IfModule>
<IfModule mod_deflate.c>
SetOutputFilter DEFLATE
</IfModule>
# Netscape 4.x has some problems…
BrowserMatch ^Mozilla/4 gzip-only-text/html
# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip
# MSIE masquerades as Netscape, but it is fine
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html

# Don’t compress already-compressed files
SetEnvIfNoCase Request_URI .(?:gif|jpe?g|png)$ no-gzip dont-vary
SetEnvIfNoCase Request_URI .(?:exe|t?gz|zip|bz2|sit|rar)$ no-gzip dont-vary
SetEnvIfNoCase Request_URI .(?:avi|mov|mp3|mp4|rm|flv|swf|mp?g)$ no-gzip dont-vary
SetEnvIfNoCase Request_URI .pdf$ no-gzip dont-vary

Header append Vary User-Agent env=!dont-var
ProxyPassMatch ^/(.*\.php(/.*)?)$ fcgi://127.0.0.1:9000/var/www/$1

SSLEngine on
SSLOptions +StrictRequire
SSLProtocol -all +TLSv1 +SSLv3
SSLCipherSuite ALL:!kEDH:!ADH:!SSLv2:!EXPORT56:!EXPORT40:!RC4:!DES:+HIGH:+MEDIUM:+EXP
SSLRandomSeed startup file:/dev/urandom 1024
SSLRandomSeed connect file:/dev/urandom 1024
SSLSessionCache        "shmcb:/usr/local/apache2/logs/ssl_scache(512000)"
SSLSessionCacheTimeout  300
# Masked keys for privacy:)
SSLCertificateFile /usr/local/apache2/conf/xxxxx.crt
SSLCertificateKeyFile /usr/local/apache2/conf/xxxxx.key
SSLVerifyClient none
SSLProxyEngine off
<IfModule mime.c>
    AddType application/x-x509-ca-cert      .crt
    AddType application/x-pkcs7-crl         .crl
</IfModule>
SetEnvIf User-Agent ".*MSIE.*" nokeepalive ssl-unclean-shutdown downgrade-1.0 force-response-1.0

ServerTokens Prod
Timeout 300
KeepAlive Off
<IfModule mpm_event_module>
    StartServers          5
    MaxClients         1024
    MinSpareThreads      50
    MaxSpareThreads     150
    ThreadLimit          64
    ThreadsPerChild      64
    MaxRequestsPerChild 20000
    ListenBacklog      4096
</IfModule>

I don't think it's a limit in the OpenVZ config, as here is the output of user_beancounters (limits are fairly high)

Version: 2.5
   uid  resource                     held              maxheld              barrier                limit              failcnt
 1592:  kmemsize                 83776469            113721344           2369781760           2606759936                    0
        lockedpages                  4161                10616               578560               578560                    0
        privvmpages                670407              2743929  9223372036854775807  9223372036854775807                    0
        shmpages                     5770                 7450              1048576              1048576                    0
        dummy                           0                    0                    0                    0                    0
        numproc                       233                 1044                 3560                 3560                    0
        physpages                  157907               290092                    0              4194304                    0
        vmguarpages                     0                    0              4194304  9223372036854775807                    0
        oomguarpages                49397                83795              4194304  9223372036854775807                    0
        numtcpsock                     23                 1317                57330                57330                    0
        numflock                        4                   11                32768                36045                    0
        numpty                          2                    9                  256                  256                    0
        numsiginfo                      1                   30                  256                  256                    0
        tcpsndbuf                  512360             31732952            293529600            440294400                    0
        tcprcvbuf                  376832             21577728            293529600            440294400                    0
        othersockbuf                52400               360896            146764800            293529600                    0
        dgramrcvbuf                     0                 6936             14676480             14676480                    0
        numothersock                   61                   95                57330                57330                    0
        dcachesize               28028491             50196571            457560436            503316480                    0
        numfile                       918                 2315               655360               655360                    0
        dummy                           0                    0                    0                    0                    0
        dummy                           0                    0                    0                    0                    0
        dummy                           0                    0                    0                    0                    0
        numiptent                      24                   24                 8448                 8448                    0

In terms of identifying the performance issue, here's an album of performance data – the first two images are the ATOP output halfway through the blitz.io test, and then close to the end under maximum load. The third image is the blitz.io report. Fourth and fifth are the same (ATOP+blitz.io report) for the same static object but with SSL disabled. The blitz.io tests go from 1 to 1000 concurrency over 60 seconds. Whilst there is a marked overhead with SSL enabled, I'm still not approaching anything close to the performance I'm expecting – increasing the concurrency on blitz.io makes things even worse. So, I leave this to your wisdom, feel free to ask for any clarifications and suggest any changes for me to try and re-test:)

Best Answer

Since the event MPM relies on the underlying worker configuration - try the following config as a limiting experiment, and then if the changes result in a measurable difference, it would mean that this is at least one of the bottlenecks. It can then be tuned further:

These are the Apache defaults:

  • ServerLimit 16 - you can see this as being reached in the image you sent - change to 50
  • StartServers 2 - this is for initial startup - change to 5
  • MaxClients 150 - change this to 300
  • MinSpareThreads 25 - change to 50
  • MaxSpareThreads 75 - change to 150
  • ThreadsPerChild 25 - change to 50

Since we are changing by a factor of 2-3, you should see a linear improvement of about roughly the same factor.

Edit - configuration that improved - MaxClients (now known as MaxRequestWorkers) was the botleneck. Once the server actually can accept a certain number of clients, then just make sure the number of child processes and threads-per child do not exceed that number.

  <IfModule mpm_event_module>
    StartServers          5
    ServerLimit          32
    MinSpareThreads      64
    MaxSpareThreads     128
    ThreadsPerChild      64
    ThreadLimit          64
    MaxRequestWorkers   2048
    MaxRequestsPerChild 20000
    ListenBacklog      4096
  </IfModule>
Related Topic