Centos – Slow software raid with SSDs

centoscentos6.4mdraidsoftware-raid

We have moved a web application with an Oracle database to a new server because the old one was dying. The old server had two mirrored hard discs plus a separate non-mirrored SSD for the Oracle datafiles (without redo and undo log). The new server has almost the same configuration except there are now two SSDs to have them mirrored as well.

Unfortunately, the random write performance of the software RAID-1 with SSDs was very poor. During the night when a large amount of data is merged into the database the web application almost stopped working because simple insert operations like adding a log entry took 20 seconds or more. The RAID-1 simply could not keep up with Oracle's write requests caused by the nightly jobs (random access to the datafiles).

I've then reverted the configuration to the old one: no RAID but just a single SSD for the datafiles. Now the performance problems are gone, the web application is snappy at all times and the nightly jobs are about 10 times faster than with the RAID (and about the same as on the old server).

How can the software RAID possibly be at least 10 times slower than the same drive without a RAID?

Hardware:

  • Intel Xeon E3-1245 V2 @ 3.40GHz
  • 32 GB RAM
  • 2 x Seagate Constellation ES.2 ST33000650NS
  • 2 x INTEL SSDSC2BW240A4

Commands to setup the RAID:

# mdadm –-create –-name=3 /dev/md/3 --level=raid1 --raid-devices=2 /dev/sda1 /dev/sdb1
# mkfs.ext4 /dev/md3

BTW: I can't run any experiments on the new server as we had been under pressure to set it productive (the old one was dying).

Best Answer

You've got enough money to pay for Oracle but not enough money for a test environment?

No answer (it's a bit long for a comment though) but some observations:

SSDs lie about their physical block size - this is actually the erase block size - which is huge.

Most disks also lie about their geometry (so you can format them from MS-DOS) - but this really hurts the performance of data striping RAID levels (but I wouldn't have expected too much impact on mirroring).

You've not shown us how they were partitioned nor what journalling you have configured.

You need to tell ext about RAID config - although again IIRC this is more of an issue for striping than mirroring.

Write operations on a mirror will never be faster (potentially up to 2x slower although more often its in the region of 20%) than to a single disk.

The problem with SSDs is write wear. In a mirror it's unusual for spinning rust disks, even from the same batch, to fail at the same time. OTOH its massivley more probable that 2 SSDs will fail at the same time. One solution is to deliberately stagger the lifespan of the disks. But if I were setting a machine with this hardware mix I'd have used mdadm to configure hybrid storage - mirror sets between SSD and HD.

I suspect that the problem is at the filesystem tier - if it were not in production, then I'd suggest giving Oracle access to the mirror device as a raw partition and checking the performance.