Optimal Directory Depth vs Number of Files in a Directory for ext3

ext3filesystemsperformance

To access files on ext3, and (if dir_index is not being used) what is the optimal directory depth vs the number of files per directory? Does file size effect this? The total number of files might be a factor, but there still should be an equation I think…

If you don't have the benchmarks to back it up, I would still be interested in what you think might be optimal and why? Maybe certain system calls take longer, or maybe your computer science knowledge suggests what might be the answer. Or, if you have examples from other file systems that could be very interesting too, but I want to know what the answer is without having a separate indexing mechanism such as the dir_index tune2fs option.

I have seen this question danced around, wondered the answer, but never found it. At this point, practically a database very well might be the answer. However, I still want to know what the answer would be for the file system.

Best Answer

To access files on ext3, and (if dir_index is not being used) what is the optimal directory depth vs the number of files per directory?

You'll want to run your own benchmarks for this.

Does file size effect this? The total number of files might be a factor, but there still should be an equation I think...

File size does not affect this, this is a function related to the number of file header entries for whatever filesystem you're using.

If you don't have the benchmarks to back it up, I would still be interested in what you think might be optimal and why?

32,000 files is pretty much the upper limit, but from my own empirical experience, I suggest less than 10,000 files, unless you want to wait a minute or two. A few thousand can be done in about 5-20 seconds, depending on I/O and server load, etc. A few hundred, almost instantaneously.

Follow-up edit (to posted comment):

Having 8 directories of 2,500 files each is far better than having two directories of 10,000 files each. The secret is in reducing the search time in each directory.

Strangely enough, I just posted a similar answer to a similar question here.

Related Topic