The current state (2016) of SSDs in RAID

raidssdtrim

There are plenty of resources available online that discuss using SSD drives in RAID configurations – however these mostly date back a few years, and the SSD ecosystem is very fast-moving – right as we're expecting Intel's "Optane" product release later this year which will change everything… again.

I'll preface my question by affirming there is a qualitative difference between consumer-grade SSDs (e.g. Intel 535) and datacenter-grade SSDs (e.g. Intel DC S3700).

My primary concern relates to TRIM support in RAID scenarios. To my understanding, despite it being over 6 years since SSDs were introduced in consumer-grade computers and 4 years since NVMe was commercially available – modern-day RAID controllers still do not support issuing TRIM commands to attached SSDs – with the exception of Intel's RAID controllers in RAID-0 mode.

I'm surprised that TRIM support is not present in RAID-1 mode, given the way drives mirror each other, it seems straightforward. But I digress.

I note that if you want fault-tolerance with disks (both HDD and SSD), you would use them in a RAID configuration – but as the SSDs would be without TRIM it means they would suffer Write-Amplification which results in extra wear, which in turn would cause SSDs to fail prematurely – this is an unfortunate irony: a system designed to protect against drive failure might end-up directly resulting in it.

So:

  • Is TRIM support necessary for modern (2015-2016 era) SSDs?
    • Is there any difference in the need for TRIM support between SATA, SATA-Express, and NVMe-based SSDs?
  • Often drives are advertised as having improved built-in garbage-collection; does that obviate the need for TRIM? How does their GC process work in RAID environments?
  • A lot of articles and discussion from earlier years concerns SLC vs MLC flash and that SLC is preferable, due to its much longer lifespan – however it seems all SSDs today (regardless of where they sit on the Consumer-to-Enterprise spectrum) are MLC thesedays – is this distinction of relevance anymore?
    • And what about TLC flash?
  • Enterprise SSDs tend to have have much higher endurance / write-limits (often measured in how many times you can completely overwrite the drive in a day, throughout a drive's expected 5 year lifespan) – if their write-cycle limit is very high (e.g. 100 complete writes per day) does this mean that they don't need TRIM at all because those limits are so high, or – the opposite – are those limits only attainable by using TRIM?

Best Answer

Let's try to reply one question at a time:

  • Is TRIM support necessary for modern (2015-2016 era) SSDs?

Short answer: in most cases, no. Long answer: if you reserve sufficient spare space (~20%), even consumer-grade drive usually have quite good performance consistency values (but you need to avoid the drives which, instead, choke on sustained writes). Enterprise-grade drives are even better, both because they have higher spare space by default and because their controller/firmware combo is optimized toward continuous use of the drive. For example, take a look at the S3700 drive you referenced: even without trimming, it has very good write consistency.

  • Often drives are advertised as having improved built-in garbage-collection, does that obviate the need for TRIM? How does their GC process work in RAID environments

The drive garbage collector does its magic inside the drive sandbox - it does not know anything about the outside environment. This means that it is (mostly) unaffected by the RAID level of the array. That said, some RAID levels (the parity-based one, basically) can sometimes (and in some specific implementation) increase the write amplification factor, so this in turn means higher work for the GC routines.

  • A lot of articles and discussion from earlier years concerns SLC vs MLC flash and that SLC is preferable, due to its much longer lifespan, however it seems all SSDs (regardless of where they sit on the Consumer-to-Enterprise spectrum) are MLC thesedays - is this distinction of relevance anymore

SLC drives have basically disappeared from the enterprise, being relegated mainly to military and some industrial tasks. The enterprise marked is now divided in three grades:

  • HMLC/MLCe flash is the one with the better binned MLC chips, and certified to sustain at least 25000/30000 rewrite cycles;
  • 3D MLC chips are rated at about 5000-10000 rewrite cycles;
  • normal planar MLC and 3D TLC chips are rated at about 3000 rewrite cycles.

In reality, any of the above flash types should provide you with plenty of total write capacity and, in fact, you can find enterprise drives with all of the above flash types.

The real differentiation between enterprise and consumer drives are:

  • the controller/firmware combo, with enterprise drives much harder to die due to unexpected controller bug;
  • the power-protected write cache, extremely important to prevent corruptions to the Flash Translation Layer (FTL), which is stored on the flash itself.

Enterprise grade drivers are better mostly due to their controllers and power capacitors, rather than due to better flash.

  • Enterprise SSDs tend to have have much higher endurance / write-limits (often measured in how many times you can completely overwrite the drive in a day, throughout a drive's expected 5 year lifespan), does this obviate any concerns over Write-Amplification caused by not running TRIM?

As stated above, enterprise grade drives have much higher default spare space (~20%) which, in turn, drastically lowers the need for regular TRIMs

Anyway, as a side note, please consider some software RAIDs that support TRIMs (someone said Linux MDRAID?)