Linux – Windows Storage Spaces – a useful replacement for RAID6

linuxraid6storagestorage-spaceswindows

// short update at the bottom

// another update is near the bottom to reply to a suggested edit

So, at first I had this idea: Find a virtual driver to set up and use software raid on windows. Result: Failed even with support from developer.

The next idea then came into my mind after watching a YouTube video about virtualization: Put in a 2nd rather cheap gpu for a linux system that runs bare metal and set up my windows in a VM with my main gpu via passthrough. This way I could had used mdadm/lvm and let linux do all that software raid stuff. Result: Failed – due to some weird issues with my motherboard not liking the 2nd gpu at all.

Then I read something about Windows Storage Spaces and that it is able to provide fault tolerance comparable with a software RAID6 (as far as I understand it's done by filesystem shadow copies spread accross the physical drives). So I gave it a try and got it working (although it required some manual lines in powershell as the gui version doesn't expose some of the advanced settings).

As this was just in a VM the test performance was rather bad, but I noticed that data are written multiple times, which can sometimes end up in the drives are used rather unevenly. As an example: One of the virtual disks only had about 2GB written to where as another drive had about 4GB written to. So, whatever distribution algorithm is used (it doesn'T look like round-robin but more like most available physical space first) it's far from how I would expected a software RAID6 to behave.

I also noticed it's rather wasteful to make use of physical disk space. My test was using 8 disks with 50GB each. A mdadm software RAID6 resulted in about short of 300GB useable space, the storage spaces one with only about 250GB – so another 15% "penalty". Ok, I guess that's all that overhead and such, but even from a software RAID I expected to make a bit better use out of my physical disk space.

I then tested what happens if I start to remove drives, and as I had it set up with -PhysicalDiskRedundancy 2 it was able to survive it and all test data were still all available.

So, overall it seem to fit my needs for a software raid on windows supporting raid6-like fault tolerance to survive a double failure (that is: failing a 2nd drive while rebuilding the 1st failed one). About the performance: Well, it'S software raid – and as I'm currently using fakeRAID (basically a driver-specific software raid shadowed by the bios) there will be not that much more system performance impact as I have right now.

What really made me think thrice about it: There are currently two major issues: a) it can't be mounted on a linux system (I had not yet tested if and how it may can be mounted in a recovery environment) and b) in the current win10 2004 are a lot of issues already caused data loss as reported by some users on different forums.

Why am I asking this: The main "issue" is that I currently don'T have the financial options to invest in new/better hardware. I only have to spare what I currently own. Hence I'm searching for a software solution. I tried WinBTRFS as it claimed to support software RAID for it's volumes, but I wasn't able to set it up correctly even with the help from its developer. So, the base question boils down to: Is using storage spaces a vailable option if one can't afford hardware RAID or other solutions like virtualization (due to hardware incompatibility)? Sure, I have many of my "really important" data backed up on an external drive, but still: I rather would build some reliable system instead of going the "I believe in that nothin will happen" way.

// update

Just as a small update about if and how you can access such a virtual disk via WinPE: I just downloaded the current 2004 ADK and created a fresh WinPE image. As I had to use PowerShell to access the information I just copied the instructions found on the ADK PE documentation. After that I created an ISO and booted that in the VM. Without any further commands it was available right from the boot. As I read on the MSDN forums this is only true for Client versions of Windows. On Server versions storage spaces start up in a readonly and detached state (I guess for safety). So in order to read from it one have to attach it manually. To write to it, obviously, one have to change it from readonly to readwrite – but as my question on that was about how to read data in a recovery environment for me writing to such a volume isn't needed.

// additional reply

As DarcyThomas suggested in his comment, here're my background why I currently use a RAID5 and why I think to have the need for migrate to a more safer style like RAID6:

  1. Am I doing it for the small read speed advantage: Although I noticed that the array is capable of streaming data a bit faster than one of the drives is able to on its own it only really shows when I copy large files resulting in long sequential reads. When I deal with a lot of small files, which cause a lot of random I/O, the performance sometimes get worse compared to a single drive. As for write speeds it's about the same story. So, to answer this question: No, speed advantage is surely not what I'm aiming for, hence I'm ok with the even worse "penalties" a RAID6 implies.

  2. Am I doing it as a cheap backup? One surely would try to argue yes. And I sure take advantage from still have all data available if one of the drives fail. Sure, I do have the really important data on another offline drive, so in a catastrophic loss of the array (i.e. due to hardware malfunction or the board going up in smoke) I still will have my important data safe. But I for sure do take advantage of the convenience to not have to worry about a drive failing as much as if I would use them as single drives (or maybe in another configuration). I already had two drives failed (both rather short time after moving – so it's possible that it was physical transport damage both times instead of the drive worn out) and the rebuild times were quite long (about 14 hours for just 3 TB).

  3. Do I really need that one single large volume? Although another debatable question to keep it short I would reply to it simple with: Yes, at least for convenience. I have the array already filled up more than 1/3 and managing such vast amount of data across multiple drives/volumes would result in a chaos (at least for me). Another neat side effect: If someone comes by with like new stuff (music, movies, etc) I can just "dump" it on the array and can reorganize and de-duplicated later without have to worry to clog up on of the drives. I'm someone with a brain like a fly: I would forget I had put data on another drive after a few hours and would take another to find it again. Just have it all in one place treats me.

  4. As for "online" backup solutions: Yes, I know they're out there. And yes, I also know there're some one can get for free or at least cheap. And sure I would have the ability to write myself some small encryptor/decryptor code making use of asymmetric keys to secure the symmetric one rather than using passphrases. And it's not like I won't trust them. But same goes true as in number 3: Over time I would just simply forget about a few of them. And although I have a rather fast connection (250/50) having all my data across the net isn't something I'm looking towards to. But I guess that's just a personal thing.

So, to summarize: Moving on from 5 drive RAID5 to 8 drive RAID6 for me is just the next logical step. The investment will be rather low (just for the additional drives + one or two simple HBAs) and done right it shouldn't depend on proprietary stuff like the one I'm using right now. Yes, I figured out how to access a storage space from a recovery environment, but this requires its proprietary spec to stay the same without sudden changes cause incompatibilities (like the chaos with just office documents). Maybe this addition may help others in the future for replies.

Best Answer

Windows Parity Spaces are dog slow and (according to Microsoft) aren’t designed for anything except archive workloads. Microsoft keeps trying to improve write performance say implemented log missing from the hardware RAIDs, but lack of the battery-powered write back cache takes away all the fun. You can however try to improve writes by telling Spaces you have UPS.

https://docs.microsoft.com/en-us/windows-server/storage/storage-spaces/deploy-standalone-storage-spaces

Set-StoragePool -FriendlyName -IsPowerProtected $True

Another point is to use ReFS and Storage Spaces combined into so-called Mirror-Accelerated-Parity, writes will end up inside SSD tier to die on HDD tier later.

https://docs.microsoft.com/en-us/windows-server/storage/refs/mirror-accelerated-parity

http://knowledgebase.45drives.com/kb/kb450193-creating-mirror-accelerated-parity-volumes-and-storage-tiers-in-storage-spaces-windows-server-2019/

Unfortunately this isn’t 100% supported scenario for anything except Storage Spaces Direct (which is another can of worms on its own).

I’d suggest Linux MDRAID+XFS due to its stellar stability and lots of proven deployments or old stock LSI hardware RAID card from eBay if you absolutely need to stick with Windows Server OS.

Related Topic