Linux software RAID6: rebuild slow

I am trying to find the bottleneck in the rebuilding of a software raid6.

## Pause rebuilding when measuring raw I/O performance
# echo 1 > /proc/sys/dev/raid/speed_limit_min
# echo 1 > /proc/sys/dev/raid/speed_limit_max
## Drop caches so that does not interfere with measuring
# sync ; echo 3 | tee /proc/sys/vm/drop_caches >/dev/null
# time parallel -j0 "dd if=/dev/{} bs=256k count=4000 | cat >/dev/null" ::: sdbd sdbc sdbf sdbm sdbl sdbk sdbe sdbj sdbh sdbg 
4000+0 records in
4000+0 records out
1048576000 bytes (1.0 GB) copied, 7.30336 s, 144 MB/s
[... similar for each disk ...]
# time parallel -j0 "dd if=/dev/{} skip=15000000 bs=256k count=4000 | cat >/dev/null" ::: sdbd sdbc sdbf sdbm sdbl sdbk sdbe sdbj sdbh sdbg 
4000+0 records in
4000+0 records out
1048576000 bytes (1.0 GB) copied, 12.7991 s, 81.9 MB/s
[... similar for each disk ...]

So we can read sequentially at 140 MB/s in the outer tracks and 82 MB/s in the inner tracks on all the drives simultaneously. Sequential write performance is similar.

This would lead me to expect a rebuild speed of 82 MB/s or more.

# echo 800000 > /proc/sys/dev/raid/speed_limit_min
# echo 800000 > /proc/sys/dev/raid/speed_limit_max
# cat /proc/mdstat
md2 : active raid6 sdbd[10](S) sdbc[9] sdbf[0] sdbm[8] sdbl[7] sdbk[6] sdbe[11] sdbj[4] sdbi[3](F) sdbh[2] sdbg[1]
      27349121408 blocks super 1.2 level 6, 128k chunk, algorithm 2 [9/8] [UUU_UUUUU]
      [=========>...........]  recovery = 47.3% (1849905884/3907017344) finish=855.9min speed=40054K/sec

But we only get 40 MB/s. And often this drops to 30 MB/s.

# iostat -dkx 1
sdbc              0.00  8023.00    0.00  329.00     0.00 33408.00   203.09     0.70    2.12   1.06  34.80
sdbd              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdbe             13.00     0.00 8334.00    0.00 33388.00     0.00     8.01     0.65    0.08   0.06  47.20
sdbf              0.00     0.00 8348.00    0.00 33388.00     0.00     8.00     0.58    0.07   0.06  48.00
sdbg             16.00     0.00 8331.00    0.00 33388.00     0.00     8.02     0.71    0.09   0.06  48.80
sdbh            961.00     0.00 8314.00    0.00 37100.00     0.00     8.92     0.93    0.11   0.07  54.80
sdbj             70.00     0.00 8276.00    0.00 33384.00     0.00     8.07     0.78    0.10   0.06  48.40
sdbk            124.00     0.00 8221.00    0.00 33380.00     0.00     8.12     0.88    0.11   0.06  47.20
sdbl             83.00     0.00 8262.00    0.00 33380.00     0.00     8.08     0.96    0.12   0.06  47.60
sdbm              0.00     0.00 8344.00    0.00 33376.00     0.00     8.00     0.56    0.07   0.06  47.60

iostat says the disks are not 100% busy (but only 40-50%). This fits with the hypothesis that the max is around 80 MB/s.

Since this is software raid the limiting factor could be CPU. top says:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                              
38520 root      20   0     0    0    0 R   64  0.0   2947:50 md2_raid6
 6117 root      20   0     0    0    0 D   53  0.0 473:25.96 md2_resync

So md2_raid6 and md2_resync are clearly busy taking up 64% and 53% of a CPU respectively, but not near 100%.

The chunk size (128k) of the RAID was chosen after measuring which chunksize gave the least CPU penalty.

If this speed is normal: What is the limiting factor? Can I measure that?

If this speed is not normal: How can I find the limiting factor? Can I change that?

Best Answer

I don't remember exactly the speeds I had when I migrated to 6 disk RAID 6 from 4 disk RAID 5, but they were similar (4TB usable array, 24h rebuild, so around 45MB/s).

You have to remember that even the speed_limit_min will give some priority to applications that try to use the array. As such, the mechanism used to detect activity may require a 50% load on the disks to detect it and still have the ability to serve the IO requests. Did you try unmounting the partition?

To check for bottlenecks you'll have to trace the kernel (for example, using Linux Tracing Toolkit lttng, or System Tap). It's not easy and will take lots of time so unless you have to rebuild the arrays on few computers, It's probably not worth it. As for changing it: I'm sure such patches to Linux kernel will be welcome :)

Best Answer

Related Solutions

Linux – Weird nfs performance: 1 thread better than 8, 8 better than 2!

Linux: very slow mdadm resync on raid 1

Related Topic