Concerns of simultaneous failure of SSD Raid1

raidraid1raid10raid6ssd

Since SSD's have a write limit requiring wear leveling one would assume that all things being equal 2 identical SSD drives would wear nearly identically if they mirror data writes. When one drive fails you can assume the other one is holding on by minute differences in materials and logic. With write leveling not limiting physical location of bits by logical structure I would assume that a solution for this would be getting similarly sized, but not identical drives. For example I would mirror a 240gb drive with a 256gb drive. Even though logically I am not using that 16GB physical space, with write leveling the drive would not allow that area to be ignored. Or am I conflating the mechanics of write leveling?

Or should Raid1 just be avoided entirely in favor of a Raid5 or Raid6 with a Hot swap? With no identical data being written the wear leveling should be different across drives. In this case failure of a singe drive would not be an indication of immediate failure of the other drives. Even if it is, the fault tolerance of Raid6 in losing 2 drives should ameliorate those concerns. Though there would be a large processing hit to recalculate parity when swapping in a new drive, the IO speed of SSD would reduce the total rebuild time required over spinning media. Also the total space available, according to a raid calculator, from 4 256Gb drives would be the same if I went with Raid10 or Raid6. If I had the funds I would buy 8 drives and 2 raid cards and test both, see how failures occur, but I totally don't have the funds for that research.

Which way should I go in this, Raid10 or Raid6? Is there much documentation on simultaneous failure of mirrored identical SSD's? If so, does writing different data on each device with Raid6 protection against this, or does the quantity of data, not the shape of the data dictate drive wear? Does mismatched size offer some protection from this as the wear leveling will use all the available physical hardware instead of what is dictated by the Logical structure? And with the fast IO of SSD, does Raid6 become more attractive when rebuilding data on replacement drives?

Best Answer

In practice, MTBF and write cycle tolerances of SSD are estimates not a killswitch. If a SSD is rated for a billion writes, it doesn't die at a billion and one writes. Combine wear-leveling algorithms, things like TRIM or other on-chip garbage collection and two SSDs dying minutes or even days apart from each other because of wear from writes would be rare.

You should be monitoring your hardware for preemptive failures anyway, so even if both disks were failing at the same time, you'd be able to replace both before a catastrophic failure from something like write wear.

Related Topic