Linux; Set RAID5 chunk size to 4kB to match FS block size and physical disk sector size

linuxmdadmsoftware-raid

There are many discussions and websites that explain the process of setting up a Linux software RAID with mdadm with the chunk size of a new RAID as 128kBs or 512Kbs. Serverfault is no exception.

I am now building a new media NAS and I see no good reason why I don't use a chunk size of 4kB though. Each of the four physical disks in the 'RAID-5-to-be' have 4KB sectors. Surely a 4kB chuck size makes the most sense mapping a 1:1 relationship from RAID volume down to disk sectors? Then atop that create the file system (which will be EXT4) with a 4kB block size?

How does a 128kB (for example) chunk size become more beneficial when disks only have 4kB sectors?

Best Answer

This is related to read-ahead. Rotating drives suffer from an extremely slow access access time, so you want to minimize access time and read sequentially as much as possible. To achieve this, Linux use a default 128KB read-ahead value, which mean that every time you request a 1KB block, 128KB would actually be read and cached.

Check your read-ahead setting with

cat /sys/block/sda/queue/read_ahead_kb 

Actually this 128KB value is extremely conservative and is better fitted for old ATA drives from ten years ago with 512 KB cache. For modern, 64 MB cache drives, a 1 or 2 MB value would probably be a better fit. For hardware RAID with large caches, values of 64MB or more are to be preferred.

Don't forget to play with the read-ahead settings to see how they affect your performance:

echo 1024 > /sys/block/sda/queue/read_ahead_kb