Linux – Poor NFS performance when sequentially reading large files

hard drivelinuxnfsperformanceredhat

I have an NFS3 server with multiple clients. Each client is sequentially reading a different large file, and performance is very poor.

Here is what I am observing in iostat on the server for the disk where the files reside:

Device: rrqm/s  wrqm/s    r/s   w/s  rMB/s  wMB/s avgrq-sz avgqu-sz  await svctm  %util
sdX      24.33    0.00  712.67 0.00  18.41   0.00    52.91    11.95  16.91  1.40 100.00

As you can see, %util is 100%. At the same time, the aggregate I/O throughput (rMB/s+wMB/s) is about 18MB/s, which is 10-20 times slower than what the disk is capable of.

This, and the ratio of rMB/s to r/s, lead me to conclude that instead of reading large chunks of each file at a time, NFS ends up reading the files in smallish chunks with lots of interleaving of chunks between different files. This in turns leads to lots of disk seeks, killing performance.

Would you say the conclusion is justified by the evidence?

What would you recommend as a way to address this? I can change the reading app, and can tweak NFS settings on both the server and the client. I am using RedHat 5.6 with kernel 2.6.18, which I believe limits rsize to 32KB (I'd be happy to be proved wrong on this).

edit: This is how things look when there's only a single client reading a single file:

Device: rrqm/s  wrqm/s     r/s   w/s    rMB/s  wMB/s avgrq-sz avgqu-sz  await svctm  %util
sdX     343.33    0.33 1803.33  0.67   105.78   0.00   120.09     0.91   0.50  0.31  56.47

As you can see, the throughput is a lot better, and %util is also much lower.

Best Answer

Faster disks, more memory in the box. I think your conclusion is right - you're seek bound.

How much memory does your NFS server have vs. your working set? Will your working set fit into cache?

What is the backend storage? You say it does ~180-360MB/sec throughput but how does it perform for random I/O? I'd suggest using something like fio to get an idea. seekwatcher is also fun to visualize the I/O. But if you can avoid hitting the disks so much the better.