CentOS : How to improve NFS listing performance

centosnetworkingnfs

NFS had been working great for me for the last years, but I'm now confronted to a performance issue to which I can't actually find a solution.

My problem is that, on the NFS server I have around 5Gb of small files and when, from a client, I do a "ls" or a "du" into the mounted directory, it can take more than 2 minutes to list all the files.

I think the problem is coming from the fact that for every single file, NFS will send a query for the file's stats, wait for the response and then send a new query for the following file. If this is the case, I'm pretty certain this is what's causing my bad performance issue.

Now, I tried looking for a solution to this, but I did not manage to find one, so I decided to open this thread.

Does any of you have any idea of how I could fix my performance issue?

Many thanks from a linux sys-admin padawan.

Best Answer

My feeling is that this problem isn't specific to NFS. Historically, UNIX filesystems in general have problems with flat directories containing large numbers of files. Certainly, the rule of thumb I was told many years ago is that performance degrades with the square of the size of the directory file. As you point out, doing an ls -la means stating each inode, and that takes a lot of time once the directory file starts to grow; the latency added by NFS will exacerbate this, but it's only bringing an underlying problem to your attention, not causing it.

The solution, as I constantly tell my developers, is not to store large numbers of files in shallow, wide structures, but in narrow, deep ones.

Look at how existing utilities store files when they have many to store: yum makes many files under /var/lib/yum/yumdb, so it stores them in subdirectories by leading initial:

drwxr-xr-x.   4 root root  4096 Sep  9  2011 C
drwxr-xr-x.   3 root root  4096 Sep  9  2011 M
drwxr-xr-x.   3 root root  4096 Jul 13 10:05 S
drwxr-xr-x.  24 root root  4096 Jul 13 10:05 a
drwxr-xr-x.  18 root root  4096 Nov  7 11:10 b
[c through y omitted to save space]
drwxr-xr-x.   5 root root  4096 Dec 28  2011 z

Squid cache, when initialised with squid -z, makes /var/spool/squid/0[0-F], and under each of these makes subdirectories ./[0-F][0-F]. innd pulls a similar trick, if memory serves, when it isn't using a ring-buffer-type file structure. All these daemons, and many other similar ones, know that if they need to store lots of small files, having a deep set of subdirectories to store them in is essential to efficient operation.

Edit: 1s is a very long time to take to do an ls on a single local directory. As I said, I think the NFS latency is exacerbating your issue; but it isn't responsible for the problem, only for making it big enough to cause you grief.