Linux – How to solve linux subdirectories number limit

directoryfilesystemslinuxscalability

I have a website which will store user profile images. Each image is stored in a directory (Linux) specific to the user. Currently I have a customer base of 30+, which means I will have 30+ folders. But my current Linux box (ext2/ext3) doesn't support creating more than 32000 directories. How do I get past this? Even YouTube guys have got the same problem, with video thumbnails. But they solved it by moving to ReiserFS. Can't we have a better solution?

Update:When asked in IRC, people were asking about upgrading it to ext4, which has 64k limit and of course you can even get past that too. Or kernel hacking to change the limit.

Update:How about splitting the user base into folders based on the userid range. Meaning 1-1000 in one folder, 1000-2000 in the other like that. This seems to be simple. What do you say, guys?

Frankly, isn't there any other way?

Best Answer

That limit is per-directory, not for the whole filesystem, so you could work around it by further sub-dividing things. For instance instead of having all the user subdirectories in the same directory split them per the first two characters of the name so you have something like:

top_level_dir
|---aa
|   |---aardvark1
|   |---aardvark2
|---da
|   |---dan
|   |---david
|---do
    |---don

Even better would be to create some form of hash of the names and use that for the division. This way you'll get a better spread amongst the directories instead of, with the initial letters example, "da" being very full and "zz" completely empty. For instance if you take the CRC or MD5 the name and use the first 8 bits you'll get somethnig like:

top_level_dir
|---00
|   |---some_username
|   |---some_username
|---01
|   |---some_username
...
|---FF
|   |---some_username

This can be extended to further depths as needed, for instance like so if using the username not a hash value:

top_level_dir
|---a
|   |---a
|       |---aardvark1
|       |---aardvark2
|---d
    |---a
    |   |---dan
    |   |---david
    |---o
        |---don

This method is used in many places like squid's cache, to copy Ludwig's example, and the local caches of web browsers.

One important thing to note is that with ext2/3 you will start to hit performance issues before you get close to the 32,000 limit anyway, as directories are searched linearly. Moving to another filesystem (ext4 or reiser for instance) will remove this inefficiency (reiser searches directories with a binary-split algorimth so long directories are handled much more efficiently, ext4 may do too) as well as the fixed limit per directory.

Related Topic