Combination of ZFS and Hardware to gain Raid 51

hardware-raidhigh-availabilitysoftware-raidzfs

I'm looking into RAID solutions for a very large file server (70+TB, ONLY serving NFS and CIFS). I know using ZFS raid on top of hardware raid is generally contraindicated, however I find myself in an unusual situation.

My personal preference would be to setup large RAID-51 virtual disks. I.e Two mirrored RAID5, with each RAID5 having 9 data + 1 hotspare (so we don't lose TOO much storage space). This eases my administrative paranoia by having the data mirrored on two different drive chassis, while allowing for 1 disk failure in each mirror set before a crisis hits.

HOWEVER this question stems from the fact, that we have existing hardware RAID controllers (LSI Megaraid integrated disk chassis + server), licensed ONLY for RAID5 and 6. We also have an existing ZFS file system, which is intended (but not yet configured) to provide HA using RFS-1.

From what I see int he documentation, ZFS does not provide a RAID-51 solution.

The suggestion is to use the hardware raid to create two, equally sized RAID5 virtual disks on each chassis. These two RAID5 virtual disks are then presented to their respective servers as /dev/sdx.

Then use ZFS + RFS-1 to mirror those two virtual disks as an HA mirror set (see image)

Is this a good idea, a bad idea, or just an ugly (but usable) configuration.

Are there better solutions?

enter image description here

Best Answer

Interesting question...

I'm surprised about the use of RSF-1. Have you seen my guide for ZFS on Linux High Availability?

It is possible to engineer what you've describe above, however it may be overcomplicated.

Separately, none of these are issues:

Using ZFS in high availability is definitely a normal request.
And using ZFS on hardware RAID is okay.

The issue with your design is that you're injecting more complexity, performance loss and more points of failure into the solution.

CIFS and NFS workloads can benefit from ZFS caching, so that also has to factor into the topology.

If you're stuck with the hardware and can't plan for a shared-chassis solution with multi-path SAS disks, it's best to create two fault domains and build a shared-nothing solution or reevaluate requirements and determine acceptable loss. E.g. what failure modes and situations are you protecting against?

If I had the equipment you've described above, I'd build the two storage servers as separate nodes; designate one as primary, the other as secondary, and use a form of continuous ZFS asynchronous replication from primary to secondary. This is possible to do safely at 15 second intervals, even on busy systems.

If you want to use both servers simultaneously, you could serve different sets of data on each and replicate both ways. If you need to do with with hardware RAID on each node, that's fine. You'd still be able to leverage ZFS RAM caching, volume management, compression and possibly L2ARC and SLOG devices.

The design is simpler, performance will be maximized, this opens up the opportunity to use faster caching drives and eliminates the requirement for SAS data pool disks. You can also add geographic separation between the nodes or introduce a third node if needed.