RAID0 IOPS Calculation – Why Sum of Individual Disk IOPS Differs from RAID0

hard driveiostatlinuxraid0

I have a raid0 set up consisting of 2 physical disks:

bash-4.2$ lsblk 

NAME    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
xvda    202:0    0  100G  0 disk  
`-xvda1 202:1    0  100G  0 part  /
nvme0n1 259:0    0  1.7T  0 disk  
`-md0     9:0    0  3.5T  0 raid0 /home/ec2-user/deploy
nvme1n1 259:1    0  1.7T  0 disk  
`-md0     9:0    0  3.5T  0 raid0 /home/ec2-user/deploy

I created some write load by the following command:

dd if=/dev/random of=/home/ec2-user/deploy/testfile bs=1024 count=4000000

I measured the iostats for md0 and the two physical disks:

iostat -d 1
Linux 4.14.154-128.181.amzn2.x86_64 (ip-10-123-151-189.ap-northeast-1.compute.internal)     11/08/2021  _x86_64_    (16 CPU)

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.50         0.03         3.07     253733   25752714
nvme0n1          96.83         2.76       853.15   23209231 7166198418
nvme1n1          96.66         2.72       851.67   22806490 7153813102
md0             294.30         5.75      1711.15   48271101 14373181088

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.00         0.00         0.00          0          0
nvme0n1          73.00         0.00       968.00          0        968
nvme1n1          30.00         0.00       268.00          0        268
md0             306.00         0.00      1236.00          0       1236

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.00         0.00         0.00          0          0
nvme0n1          62.00         0.00       784.00          0        784
nvme1n1          32.00         0.00       756.00          0        756
md0             382.00         0.00      1540.00          0       1540
...

I was expecting the tps for nvme0n1 and nvme1n1 to sum up to the tps for md0.

Is the high difference due to block size difference. Does the physical disk merge multiple writes into a single disk write? Is there a way to confirm that hypothesis?

Best Answer

md0, as a full-fledged block device, has its own scheduler and write merging. So, while kB_wrtn/s of each device round up to what reported for the RAID array itself, tps is considerably lower due to merging at md0 level.

To have a better visibility into merging, you can issue iostat -x -k 1 and look at wareq-sz field (or rareq-sz for reads).

Related Topic