Linux – filesystem for millions of small files

benchmarkfilesystemslinux

Which Linux filesystem would you choose for best speed in the following scenario:

  • a hundred million files
  • ~2k file size on average
  • >95% read access
  • pretty random access
  • high concurrency (>100 processes)

Note: The files are stored in a deep hierarchical tree to avoid large directories. Each leaf directory contains around one thousand files.

How would you benchmark it?

Best Answer

In terms of random seeks Reiser wins, followed by EXT4, followed by JFS. I'm not sure if this will correlate exactly to directory lookups, but it seems like it would be an indicator. You'll have to do your own tests for that specifically. EXT2 beats the pants off everything for file creation times, likely due to its lack of a journal, still EXT4 beats everything except Reiser which you may not want to use due to hans reiser's current status.

You might want to look into drives that support NCQ, and make sure your install is setup to use it. Under heavy seeking it should provide a speed boost.

Lastly, make sure your machine has a ton of ram. Since the files aren't often updated, linux will end up caching most of them to ram if it's got free space. If your usage patterns are right, this will give you a massive speed boost.