Centos – Drupal on an NFS share has terrible performance

centosdrupalnfsvmware-esxi

We have a setup where a Drupal 7 site with the following setup – a VMware ESXi 4.1 host server running a web vm and an NFS VM. The web VM is using Apache and mod_php. The site is still in development thus we have to turn off all forms of caching due to the frequently-updated files.

Each page request takes around 15-20 seconds to complete. Profiling the PHP code shows that the vast majority of time (normally over 90%) is taking by all the is_dir(), is_file() function calls that load up the modules.

I've increased PHP's realpath cache size to several megs and an strace shows that the lstat calls then drop from over 200 to around 6 and stat() decreases a bit (around 600 calls). However, while this has shaved off quite a bit of time, I am simply unable to break past the 10 second per request barrier for the most demanding page.

Is there a way to get better performance out of this setup that doesn't involve caching?

Edit: MySQL is not the issue, query caching means that requests take a second at most to complete.

Configs and stats:

VM Host: single quad-core Xeon CPU

VMs:

web – Centos 6 64bt, 2.5GB RAM, normal CPU/HD prioritisation (2 cores)
nfs – Centos 6 64bt, 2GB RAM, normal CPU priority (4 cores), high HD priority

PHP: 32M realpath cache size (it's this high for testing purposes)

NFS:

~]# egrep -v '#|^$' /etc/nfsmount.conf 
[ NFSMount_Global_Options ]
 Defaultvers=4
 Ac=False
 Rsize=32k
 Wsize=32k
 Bsize=32k

Reading speeds via NFS are not an issue a dd of a 100M test file using 32k blocks returns:

3200+0 records in
3200+0 records out
104857600 bytes (105 MB) copied, 1.84984 s, 56.7 MB/s

real    0m1.857s
user    0m0.007s
sys 0m0.330s 

Strace on Apache process with empty realpath cache:

    % time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 50.78    1.157452         337      3434        28 stat
 32.58    0.742656         628      1182       425 open
  9.29    0.211788         762       278         1 lstat
  3.17    0.072322           0    237865           write
  2.45    0.055839         490       114        13 access
  0.45    0.010262          43       237           brk
  0.34    0.007725          10       811        74 read
  0.28    0.006340           9       679           fstat
  0.22    0.005069          18       281           poll
  0.20    0.004533           6       698           getdents
  0.09    0.001960          10       190           mmap
  0.05    0.001065          14        74           accept4
  0.04    0.001000         333         3           chdir
  0.03    0.000750           4       190           munmap
  0.01    0.000339           0       836           close
  0.01    0.000247           3        75           writev
  0.00    0.000068           0       611           fcntl
  0.00    0.000063           1        77           shutdown
  0.00    0.000000           0         1           lseek
  0.00    0.000000           0         5           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         3           setitimer
  0.00    0.000000           0         5           socket
  0.00    0.000000           0         5         5 connect
  0.00    0.000000           0        74           getsockname
  0.00    0.000000           0        15           setsockopt
  0.00    0.000000           0         5           getcwd
  0.00    0.000000           0         1           futex
------ ----------- ----------- --------- --------- ----------------

Strace after realpaths are cached

    % time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 60.14    1.371006         484      2831        28 stat
 31.79    0.724705         627      1155       425 open
  3.53    0.080354           0    237865           write
  2.65    0.060433         530       114        13 access
  0.43    0.009913          99       100           brk
  0.38    0.008730          11       804        74 read
  0.35    0.007910          12       675           fstat
  0.30    0.006775          10       654           getdents
  0.13    0.003065          11       281           poll
  0.09    0.002000         333         6         1 lstat
  0.07    0.001545           2       807           close
  0.05    0.001063          14        74           accept4
  0.04    0.001000           6       179           mmap
  0.02    0.000404           2       179           munmap
  0.01    0.000271           4        75           writev
  0.01    0.000212           0       611           fcntl
  0.01    0.000129           2        77           shutdown
  0.00    0.000022           0        74           getsockname
  0.00    0.000000           0         1           lseek
  0.00    0.000000           0         5           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         3           setitimer
  0.00    0.000000           0         3           socket
  0.00    0.000000           0         3         3 connect
  0.00    0.000000           0        15           setsockopt
  0.00    0.000000           0         5           getcwd
  0.00    0.000000           0         3           chdir
------ ----------- ----------- --------- --------- ----------------

Mount:

nfs.xxx.xxx.xxx:/path/to/website/files on /path/to/website/files type nfs (rw,hard,intr,noac,vers=4,addr=xx.xx.xx.xx,clientaddr=xx.xx.xx.xx)

Any help is, naturally, appreciated.

Best Answer

To be frank, Drupal on NFS is a real pig. At most, you want to share the "files" director{y,ies} via NFS or something like gluster. The problem with running the DocRoot on NFS is that, added up, all of the lstat(2) and access(2) calls are killer, let alone the getdents(2)s and friends that you'll see in module directories. Something like APC will significantly help the actual read(2) times, as well as removing the compilation delays, but PHP will still do lstat(2) and access(2) on every file. To further speed things up, you can set apc.stat=0, but that won't help you if, as you said, you're constantly changing PHP files, unless you're willing to restart Apache (or manually clear the APC cache via apc.php) every time you make such a change.

Best practices recommend storing the DocRoot on either a dedicated, optimized device (such as a SAN), or separately on each webhead. The "files" directory should generally be shared via gluster/nfs/etc., but as an alternative, you can also periodically rsync it between servers, depending on the use case and whether the LB in front supports sticky sessions. You can also eliminate the files directory altogether by use of a CDN or a service like Amazon's S3 or BlackMesh's Swift.

A hosting provider with detailed, specialized knowledge of Drupal can help you with some of these architectural concerns; you may wish to contact Acquia or BlackMesh (I work for the latter). I don't know if Acquia does, but I know BlackMesh also offers off-site assistance where we work with your existing hosting provider or on-site hosting to optimize the solution for Drupal.

Best of luck with your site!