SW SSD Raid 1 over HW RAID 10

hard drivehardwareraidsoftware-raidssd

A provider(data center) recommended I go with 1TB SSDs in a software RAID 1 over HW RAID 10 with mechanical drives.

Their quote:

Typically SSDs are most reliable than RAID cards and since you have
less parts, there are less points of failure. There won't be much of a
CPU load since RAID1 is extremely simple storage.

How true is that and when running virtual machines is RAID 1 SW even ideal? They say so.

Some more details:
I plan to run XEN/XEN-HvM/KVM — in other words, it will be Linux running as the HOST and I want a setup where the guests can host Windows to Linux and can compile their own kernels.

What I want to accomplish:
To be able to quickly recognize a drive failure and have a replacement thrown in with little to no downtime or performance hits.

Best Answer

In RAID10 any one of your drives can fail and the array will survive, the same as RAID1. While RAID10 can survive four of the six "two drives failed at once" circumstances the main reason to use R10 with four drives instead of R1 with two is performance rather than extra reliability, and the SSDs will give you a greater performance jump.

Early SSDs had reliability issues, but most properly run tests I've seen suggest that those days are long gone and they tend to be no more likely to fail than spinning metal based drives - the overall reliability has increased and wear levelling tricks are getting very intelligent.

when running virtual machines is RAID 1 SW even ideal?

I'm assuming you are running the RAID array on the host, in which case unless you have a specific load pattern in your VMs (that would be a problem on direct physical hardware too) the difference between soft RAID and hard RAID is not going to be dependent on the use of VMs. If you are running RAID inside the VMs then you are likely to be doing something wrong (unless the VMs are for learning or testing RAID management of course).

The key advantages of hardware RAID are:

  • Potential speed boost due to multiplexed writes: software RAID1 will likely write to each drive in turn where with hardware RAID1 the OS writes just once and the hardware writes to both in parallel. In theory this can double your peak bulk transfer rate (though in reality the difference will likely be far smaller than that) but will impart little or no difference on random writes (where with spinning metal the main bottleneck is head movements and with SSDs the main bottleneck is needing to write larger blocks even for small writes, and the block clearing time if there are no blocks ready).
  • Safety through battery backup (or solid-state) cache (though this is only on high spec controllers) allowing caching to be done safely on the controller because in the even in sudden power loss situations the controller can maintain written blocks that haven't hit the drives yet and write them when power returns.
  • Hot-swap is more likely to be supported (though your DC's kit may support hot-swap more generally so it may be available for SW RAID too).

The key advantage of good software RAID (i.e. Linux's mdadm managed arrays) is:

  • Your array is never locked to a given controller (or worse, specific versions of a given controller) meaning your arrays can be moved to new kit if all the other hardware fails but they survive. I've used this to save a file server that had its motherboard die: we just transplanted the drives into a new box and everything came back up with no manual intervention (we did verify the drives against a recent backup and replace them ASAP, in case the death was a power problem that had affected but not immediately killed the drives, but this easy transplant meant we had greatly reduced downtime outside maintenance windows). This is less of an issue if your DC is well stocked with spare parts immediately to hand of course.

On SSD Reliability & Performance:

SSD over-provision space for two reasons: it leaves plenty of blocks free to be remapped if a block goes bad (traditional drives do this too) and it stops the write performance hole (except for huge write-heavy loads) even where TRIM is not used as the extra blocks can cycle through the wear levelling pool along with all the others (and the controller can pre-wipe them ready for next use at its leisure). Consumer grade drives only really under-allocate enough for the remapping use and a small amount of performance protection, so it is useful to manually under-allocate (partitioning only 200GiB of a 240GB drive for instance) which has a similar effect. See reports like this one for details on this (that report is released by a controller manufacturer but seems a general description of the matter rather then a sales pitch, you'll no doubt find manufacturer-neutral reports on the same subject if you look for them). Enterprise grade drives tend to over-provision by much larger amounts (for both the above reasons: reliability and performance).