Linux – NFS / DRBD / XFS Performance issues

drbdlinuxnfsxfs

we have a NFS sitting on top of XFS and drbd which delivers us a horrible performance (about 1MB/s read / write as shown in iostat/iotop)
the xfs volume properties are:

meta-data=/dev/drbd0             isize=256    agcount=4, agsize=52427198 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=209708791, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

and we have a Dell Box with an SAS1068E Controller and 2 WD 1TB Disks
The volume is currently mounted with the properties:

rw,noatime,nodiratime,attr2,nobarrier,logbufs=8,noquota 

The filesystem contains tons of small files all about 50-100k in size, they are spread around in the directory tree.

We tried playing around with ReadAhead Values (currently disabled) and the xfs mount options but nothing seemed successful so far.

We noticed in the iotop that kdmflush is the process that causes the iowait
Any suggestions to improve the Performance of this Setup?

Best Answer

The short answer is that your disk system is woefully underspec for what you're trying to do.

1MB/sec is fairly typical of random IO performance on RAID1 on SATA disks. EG, see wmarow's iops anr raid calculator here. Putting two Barracuda ES.2 SATA disks in a RAID10 (effectively the same as a RAID1), setting 100% writes with 0% write cache hit shows an estimated 0.57MB/sec of throughput. Real-world performance may be different, but it's not going to be massively different.

The fact that you identify kdmflush as the responsible kernel process reinforces this - if your disk system is not able to handle the load, it will result in more time spend in iowait in this process. kdmflush is the device-mapper flush process, which handles deferred work due to loading elsewhere.

There are a couple of ways of improving this - get more disks, get better disks, or turn on write caching on the controller.

If you turn on write caching, you will want to get a BBU as well. The BBU may not be an option for the onboard SAS1068E though, so you may have to get a PCI-e controller.

I saw abysmal performance with DRBD when the RAID controllers I was using (3ware 9550 I believe) did not have write cache enabled. Your DRBD loading will be mostly random IO, so write caching will have a significant effect on performance.

The SAS1068E is very low end, and may also be contributing to the problem. If you get more disks or better disks, I'd suggest getting a better controller as well.

A quick google search reveals similarly poor performance with the same model RAID controller you are using.