Linux – max files per directory in ext4

ext4filesystemslinuxUbuntu

I manage an application that contains a filestore in which all the files are stored with the filenames equal to their md5 sums. All files are stored in one directory. Currently there are thousands, but soon their should be millions of files on the server. The current server is running Ubuntu 11.10 on an ext4 filesystem.

Someone told me that it is not wise to put many files in a directory, as this will create significant increase in lookup time and reliability (he had a story about max files a single dir could point to, resulting in a big linked list). Instead he suggested to create sub directories with e.g. substrings of the filename. However, this will make some things in my application much more cumbersome.

Is this still true, or do modern filesystems (e.g. ext4) have more efficient ways to deal with this and naturally scale? Wikipedia has some details on filesystems, but it doesn't really say anything about max files per directory, or lookup times.

Best Answer

The ext3 and later filesystems support hashed B-tree directory indexing. This scales very well as long as the only operations you do are add, delete, and access by name. However, I would still recommend breaking the directories down. Otherwise, you create a dangerous booby trap for tools (updatedb, ls, du, and so on) that perform other operations on directories that can blow up if the directory has too many entries.