Linux – 2 drives, slow software RAID1 (md)

hard drivelinuxraidsoftware-raid

I've got a server from hetzner.de (EQ4) with 2* SAMSUNG HD753LJ drives (750G 32MB cache).

OS is CentOS 5 (x86_64). Drives are combined together into two RAID1 partitions:

  1. /dev/md0 which is 512MB big and has only /boot partitions
  2. /dev/md1 which is over 700GB big and is one big LVM which hosts other partitions

Now, I've been running some benchmarks and it seems like even though exactly the same drives, speed differs a bit on each of them.

# hdparm -tT /dev/sda

/dev/sda:  Timing cached reads:   25612 MB in  1.99 seconds = 12860.70 MB/sec  Timing buffered disk reads:  352 MB in  3.01 seconds = 116.80 MB/sec
# hdparm -tT /dev/sdb

/dev/sdb:  Timing cached reads:   25524 MB in  1.99 seconds = 12815.99 MB/sec  Timing buffered disk reads:  342 MB in  3.01 seconds = 113.64 MB/sec

Also, when I run eg. pgbench which is stressing IO quite heavily, I can see following from iostat output:

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   231.40  0.00 298.00     0.00  9683.20    32.49     0.17    0.58   0.34  10.24
sda1              0.00     0.00  0.00   0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00   231.40  0.00 298.00     0.00  9683.20    32.49     0.17    0.58   0.34  10.24
sdb               0.00   231.40  0.00 301.80     0.00  9740.80    32.28    14.19   51.17   3.10  93.68
sdb1              0.00     0.00  0.00   0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb2              0.00   231.40  0.00 301.80     0.00  9740.80    32.28    14.19   51.17   3.10  93.68
md1               0.00     0.00  0.00 529.60     0.00  9692.80    18.30     0.00    0.00   0.00   0.00
md0               0.00     0.00  0.00   0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-0              0.00     0.00  0.00   0.60     0.00     4.80     8.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00  0.00 529.00     0.00  9688.00    18.31    24.51   49.91   1.81  95.92

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   152.40  0.00 330.60     0.00  5176.00    15.66     0.19    0.57   0.19   6.24
sda1              0.00     0.00  0.00   0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00   152.40  0.00 330.60     0.00  5176.00    15.66     0.19    0.57   0.19   6.24
sdb               0.00   152.40  0.00 326.20     0.00  5118.40    15.69    19.96   55.36   3.01  98.16
sdb1              0.00     0.00  0.00   0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb2              0.00   152.40  0.00 326.20     0.00  5118.40    15.69    19.96   55.36   3.01  98.16
md1               0.00     0.00  0.00 482.80     0.00  5166.40    10.70     0.00    0.00   0.00   0.00
md0               0.00     0.00  0.00   0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-0              0.00     0.00  0.00   0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00  0.00 482.80     0.00  5166.40    10.70    30.19   56.92   2.05  99.04

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   181.64  0.00 324.55     0.00  5445.11    16.78     0.15    0.45   0.21   6.87
sda1              0.00     0.00  0.00   0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00   181.64  0.00 324.55     0.00  5445.11    16.78     0.15    0.45   0.21   6.87
sdb               0.00   181.84  0.00 328.54     0.00  5493.01    16.72    18.34   61.57   3.01  99.00
sdb1              0.00     0.00  0.00   0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb2              0.00   181.84  0.00 328.54     0.00  5493.01    16.72    18.34   61.57   3.01  99.00
md1               0.00     0.00  0.00 506.39     0.00  5477.05    10.82     0.00    0.00   0.00   0.00
md0               0.00     0.00  0.00   0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-0              0.00     0.00  0.00   0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00  0.00 506.39     0.00  5477.05    10.82    28.77   62.15   1.96  99.00

And this is completely getting me confused. How come two exactly the same specced drives have such a difference in write speed (see util%)? I haven't really paid attention to those speeds before, so perhaps that something normal — if someone could confirm I would be really grateful.

Otherwise, if someone have seen such behavior again or knows what is causing such behavior I would really appreciate answer.

I'll also add that both "smartctl -a" and "hdparm -I" output are exactly the same and are not indicating any hardware problems.
The slower drive was changed already two times (to new ones). Also I asked to change the drives with places, and then sda were slower and sdb quicker (so the slow one was the same drive).
SATA cables were changed two times already.

Best Answer

Could you please try bonnie++ benchmark tool? You should run it with twice the size of the memory (example for 1GB):

bonnie++ -s $((2*1024))

Your problem description makes me think it is the controller that can't easily handle the parallel writes that the software RAID1 does. Use the command above on the following situations. To check if this hypothesis is true, please do:

1) Separate benchmarks for each hard disk. Hypothesis says that the results will be similar.

2) Benchmark the RAID1.

3) Simultaneous benchmark on different disks. Hypothesis says that it should look more like 2) than 1).

Good luck,
João Miguel Neves

Related Topic