I agree, that it may be related to stripe alignment. From my experience creation of unaligned XFS on 3*2TB RAID-0 takes ~5 minutes but if it is aligned to stripe size it is ~10-15 seconds. Here is a command for aligning XFS to 256KB stripe size:
mkfs.xfs -l internal,lazy-count=1,sunit=512 -d agsize=64g,sunit=512,swidth=1536 -b size=4096 /dev/vg10/lv00
BTW, stripe width in my case is 3 units, which will be the same for you with 4 drives but in raid-5.
Obviously, this also improves FS performance, so you better keep it aligned.
The logic for when Linux applies read-ahead is complicated. Starting in 2.6.23 there's the really fancy On-Demand Readahead, before that it used a less complicated prediction mechanism. The design goals of read-ahead always include not doing read-ahead unless you have a read access pattern that justifies it. So the idea that the stripe size is a relevant piece of data here is fundamentally unsound. Individual reads that are on that end of the file I/O range, below the stripe size, aren't normally going to trigger the read-ahead logic and have it applied to them anyway. Tiny values of read-ahead effectively turn the feature off. And you don't want that.
When you really are doing sequential I/O to a large RAID10 array, the only way to reach the full throughput of many systems is to have read-ahead working for you. Otherwise Linux won't dispatch requests fast enough to keep the array reading to its full potential. The last few times I've tested larger disk arrays of RAID10 drives, in the 24 disk array range, large read-ahead settings (>=4096 = 2048KB) have given 50 to 100% performance gains on sequential I/O, as measured by dd or bonnie++. Try that yourself; run bonnie++, increase read-ahead a lot, and see what happens. If you have a large array, that will quickly dispel the idea that read-ahead numbers smaller than typical stripe sizes make any sense.
The Linux kernel is so aware of this necessity that it even automatically increases read-ahead for you when you create some types of arrays. Check out this example from a system with a 2.6.32 kernel:
[root@toy ~]# blockdev --report
RO RA SSZ BSZ StartSec Size Device
rw 256 512 4096 0 905712320512 /dev/md1
rw 768 512 512 0 900026204160 /dev/md0
Why is read-ahead 256 (128KB) on md1 while it's 768 (384KB) on md0? That's because md1 is a 3-disk RAID0, and Linux increases read-ahead knowing it has no hope of achieving full speed across an array of that size with the default of 256. Even that's actually too low; it needs to be 2048 (1024KB) or larger to hit the maximum speed that small array is capable of.
Much of the lore on low-level RAID settings like stripe sizes and alignment is just that: lore, not reality. Run some benchmarks at a few read-ahead settings yourself, see what happens, and then you'll have known good facts to work with instead.
Best Answer
The existing answers are quite outdated. Here in 2020, it's now possible to grow an
mdadm
software RAID 10, simply by adding 2 or more same sized disks.Creating the example RAID 10 array
For testing purposes, instead of physical drives, I created 6x 10GB LVM volumes,
/dev/vg0/rtest1
tortest6
- which mdadm had no complaints about.Next, I created a RAID 10 mdadm array using the first 4
rtestX
volumesUsing
mdadm -D
(equal to--detail
), we can see the array has 4x "drives", with a capacity of 20GB out of the 40GB of volumes, as is expected with RAID 10.Expanding the RAID10 with 2 new equal sized volumes/disks
To grow the array, first you need to
--add
the pair(s) of disks to the array, then use--grow --raid-devices=X
(where X is the new total number of disks in the RAID) to request that mdadm reshapes the RAID10 to use the 2 spare disks as part of the array.Monitor the resync process
Here's the boring part - waiting for anywhere from minutes, to hours, to days or even weeks depending on how big your RAID is, until mdadm finishes reshaping around the new drives.
If we check
mdadm -D
- we can see the RAID is currently reshaping.Enjoy your larger RAID10 array!
Eventually once
mdadm
finishes reshaping, we can now see that the array size is ~30G instead of ~20G, meaning that the reshaping was successful and relatively painless to do :)