Linux – Optimize Linux file system for reading ~500M small files

filesystemslinuxnetwork-attached-storageperformancestorage

We're seeing performance problems on a file system that contains ~500 M files that are relatively small (typically tens of kb) and ~150 K directories. Files access is predominantly reads and writes are fairly rare. Files are mostly stored in a hashed directory hierarchy with ~30K files per directory, but in some cases there can be up to 500 K files in one directory.

The server shares the file system to ~10 client machines.

Getting directory listings is often slow and sometimes also reading files by absolute paths, even locally.

This is a physical server that runs Ubuntu 12.04.4 (kernel: 3.8.0 / x86_64) and we use ext4 on a hardware RAID-6 volume.

What would be a recommended file system setup in this scenario?

  • is there a file system particularly well suited for this case (e.g. ext4, xfs, btrfs, …)?
  • what kind of RAID configuration (e.g. sw vs. hw, RAID level etc.) should we use?
  • what about the file sharing setup (technology [e.g. NFS vs. competition], configuration, …)?

Best Answer

When you have a problem like this, you have to:

  • obtain all requirements (latency, bandwidth, redundancy, reliability, security, required features...)
  • analyse the current systems. If they are none, create test environments. Understand how all components work. Understand the current and the expected load.
  • add system monitoring (with graphs) for both production and test systems. Monitor at least CPU usage, network usage and Disk I/O usage.
  • create test servers and load test them. Load test with synthetic benchmarks not only with micro-benchmarks.

Use stable versions of recent OS with latest stable kernel.