ZFS – Is RAIDZ-1 really that bad

raidztruenaszfs

I have an NAS server with 4x 2TB WD RE4-GP drives in a RAID10 configuration (4TB usable). I'm running out of space (<1TB usable space left). I have $0 to spend on bigger/more drives/enclosures.

I like what I've read about the data-integrity features of ZFS, which – on their own – are enough for me to switch from my existing XFS (software) RAID10. Then I read about ZFS's superior implementation of RAID5, so I thought I might even get up to 2TB more usable space in the bargain using RAIDZ-1.

However, I keep reading more and more posts saying pretty much to just never use RAIDZ-1. Only RAIDZ-2+ is reliable enough to handle "real world" drive failures. Of course, in my case, RAIDZ-2 doesn't make any sense. It'd be much better to use two mirrored vdevs in a single pool (RAID10).

Am I crazy wanting to use RAIDZ-1 for 4x 2TB drives?

Should I just use a pool of two mirrored vdevs (essentially RAID10) and hope the compression gives me enough extra space?

Either way, I plan on using compression. I only have 8GB of RAM (maxed), so dedup isn't an option.

This will be on a FreeNAS server (about to replace the current Ubuntu OS) to avoid the stability issues of ZFS-on-Linux.

Best Answer

Before we go into specifics, consider your use case. Are you storing photos, MP3's and DVD rips? If so, you might not care whether you permanently lose a single block from the array. On the other hand, if it's important data, this might be a disaster.

The statement that RAIDZ-1 is "not good enough for real world failures" is because you are likely to have a latent media error on one of your surviving disks when reconstruction time comes. The same logic applies to RAID5.

ZFS mitigates this failure to some extent. If a RAID5 device can't be reconstructed, you are pretty much out of luck; copy your (remaining) data off and rebuild from scratch. With ZFS, on the other hand, it will reconstruct all but the bad chunk, and let the administrator "clear" the errors. You'll lose a file/portion of a file, but you won't lose the entire array. And, of course, ZFS's parity checking means that you will be reliably informed that there's an error. Otherwise, I believe it's possible (although unlikely) that multiple errors will result in a rebuild apparently succeeding, but giving you back bad data.

Since ZFS is a "Rampant Layering Violation," it also knows which areas don't have data on them, and can skip them in the rebuild. So if your array is half empty you're half as likely to have a rebuild error.

You can reduce the likelihood of these kinds of rebuild errors on any RAID level by doing regular "zpool scrubs" or "mdadm checks"of your array. There are similar commands/processes for other RAID's; e.g., LSI/dell PERC raid cards call this "patrol read." These go read everything, which may help the disk drives find failing sectors, and reassign them, before they become permanent. If they are permanent, the RAID system (ZFS/md/raid card/whatever) can rebuild the data from parity.

Even if you use RAIDZ2 or RAID6, regular scrubs are important.

One final note - RAID of any sort is not a substitute for backups - it won't protect you against accidental deletion, ransomware, etc. Although regular ZFS snapshots can be part of a backup strategy.