Centos – NFS poor write performance

centosnfsperformance

I have two machines connected with 10Gbit Ethernet. Let one of them be NFS server and another will be NFs client.

Testing network speed over TCP with iperf shows ~9.8 Gbit/s throughput in both directions, so network is OK.

Testing NFS server's disk performance:

dd if=/dev/zero of=/mnt/test/rnd2 count=1000000

Result is ~150 MBytes/s, so disk works fine for writing.

Server's /etc/exports is:

/mnt/test 192.168.1.0/24(rw,no_root_squash,insecure,sync,no_subtree_check)

Client mounts this share to it's local /mnt/test with following options:

node02:~ # mount | grep nfs
192.168.1.101:/mnt/test on /mnt/test type nfs4 (rw,relatime,sync,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.102,local_lock=none,addr=192.168.1.101)

If I try to download a large file (~5Gb) on the client machine from the NFS share, I get ~130-140 MBytes/s performance which is close to server's local disk performance, so it's satisfactory.

But when I try do upload a large file to the NFS share, upload starts at ~1.5 Mbytes/s, slowly increases up to 18-20 Mbytes/s and stops increasing.
Sometimes the share "hangs" for a couple of minutes before upload actually starts, i.e. traffic between hosts becomes close to zero and if I execute ls /mnt/test, it does not return during a minute or two. Then ls command returns and upload starts at it's initial 1.5Mbit/s speed.

When upload speed reaches it's maximum (18-20 Mbytes/s), I run iptraf-ng and it shows ~190 Mbit/s traffic on the network interface, so network is not a bottleneck here, as well as server's HDD.

What I tried:

1.
Set up an NFS server on a third host which was connected only with a 100Mbit Ethernet NIC. Results are analogical: DL shows good performance and nearly full 100Mbit network utilization, upload does not perform faster than hundreds of kilobytes per second, leaving network utilization very low (2.5 Mbit/s according to iptraf-ng).

2.
I tried to tune some NFS parameters:

  • sync or async

  • noatime

  • no hard

  • rsize and wsize are maximal in my examples, so I tried to
    decrease them in several steps down to 8192

3.
I tried to switch client and server machines (set up NFS server on former client and vice versa). Moreover, there are six more servers with the same configuration, so I tried to mount them to each other in different variations. Same result.

4.
MTU=9000, MTU=9000 and 802.3ad link aggregation, link aggregation with MTU=1500.

5.
sysctl tuning:

node01:~ # cat /etc/sysctl.conf 
net.core.wmem_max=16777216
net.core.rmem_max=16777216
net.ipv4.tcp_rmem= 10240 873800 16777216
net.ipv4.tcp_wmem= 10240 873800 16777216
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
net.core.netdev_max_backlog = 5000

Same result.

6.
Mount from localhost:

node01:~ # cat /etc/exports
/mnt/test *(rw,no_root_squash,insecure,sync,no_subtree_check)
node01:~ # mount -t nfs -o sync localhost:/mnt/test /mnt/testmount/

And here I get the same result: download from /mnt/testmount/ is fast, upload to /mnt/testmount/ is very slow, not faster than 22 MBytes/s and there is a small delay before transfer actually starts. Does it mean that network stack works flawlessly and the problem is in NFS?

All of this did not help, results didn't differ significantly from the default configuration. echo 3 > /proc/sys/vm/drop_caches was executed before all tests.

MTU of all NICS at all 3 hosts is 1500, no non-standard network tuning performed. Ethernet switch is Dell MXL 10/40Gbe.

OS is CentOS 7.

node01:/mnt/test # uname -a
Linux node01 3.10.0-123.20.1.el7.x86_64 #1 SMP Thu Jan 29 18:05:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

What settings am I missing? How to make NFS write quickly and without hangs?

Best Answer

You use the sync-option in your export statement. This means that the server only confirms write operations after they are actually written to the disk. Given you have a spinning disk (i.e. no SSD), this requires on average at least 1/2 revolution of the disk per write operation, which is the cause of the slowdown.

Using the async setting, the server immediately acknowledges the write-operation to the client when it is processed but not yet written to the disk. This is a little bit more unreliable, e.g., in case of a power failure when the client received an ack for an operation that did not happened. However, it delivers a huge increase in write-performance.

(edit) I just saw that you already tested the options async vs sync. However, I am almost sure that this is the cause of your performance degradation issue -- I once had exactly the same indication with an idencitcal setup. Maybe you test it again. Did you give the async option at the export statement of the server AND in the mount operation at the client at the same time?

Related Topic