Risk of FreeNAS as Virtual File Server

file-sharingfilesystemstruenasvirtualizationzfs

There is much debate as to whether FreeNAS can be run as a virtual machine.

The official position is that yes, but there is additional configuration required.

If I cannot guarantee that I can follow these recommendations, am I more vulnerable to failure – especially catastrophic failure – than if I run a vanilla Linux system with EXT4/XFS, or FreeBSD with UFS?

Specifically, assume I won't be able to do PCI passthrough nor can I disable write caching. Furthermore, I will only have one vDisk for storage (A VMDK backed by hardware RAID) so no RAIDZ. Obviously, there will be backups.

EDIT: To clarify why I want to do this – I need a file server, and this is the infrastructure I have to work with. If I need to I could get the extra vDisks to setup RAIDZ, but otherwise that's it. I was looking for a good file server solution, and FreeNAS seemed to fit the bill. Except that there were all of these dire warnings about virtualizing ZFS and how you can lose all of your data and corrupt your backups.

I realize that deploying FreeNAS on this infrastructure is risky. My question is: is it riskier than the alternatives?

EDIT2: I seem to be having trouble communicating my intent. FreeNAS with ZFS is a rock solid NAS platform. However, from what I have read, it seems that the features that make ZFS more reliable as a bare metal file server can actually work against you if you run it on a standard VM configuration. If so, then using a different file system is better choice on standard VM settings (i.e. no direct IO, write cache enabled). Is this a correct assessment?

Best Answer

General answer

If I cannot guarantee that I can follow these recommendations, am I more vulnerable to failure - especially catastrophic failure - than if I run a vanilla Linux system with EXT4/XFS, or FreeBSD with UFS?

The risks are different and not directly comparable.

  • I would always prefer a ZFS system, even without redundant vdevs, if only for the knowledge of data integrity (even if I have to restore from backup, I like to know that I have to restore from backup, instead of silent corruption I am not even aware of). Also features like send/recv or snapshots make your live much easier and have nothing to do with integrity.
  • Speaking of catastrophic failure, only backups will prevent you from that, and you need them even if your normal system is highly reliable, so it makes sense to first start with the backups (as you already did) to get that out of the way and then think about what other quality of service you require and with which downsides you can live with.
  • In theory, more complex systems are more error-prone, but as all mentioned file systems are over 10 years old, actively used and maintained, I would say the majority of bugs has been ironed out already (which does not mean there are none left, of course).
  • One may argue that copy-on-write file systems are inherently safer because they never overwrite live data and therefore cannot corrupt it. I assume this risk is more theoretical and much more influenced by other things, like the actual implementation and the handling of metadata.

Specific to your case

If you look at the referenced recommendations and dissect them, you notice a few things:

  1. If you are not using PCI passthrough (more on that below), then you must disable the scrub tasks in ZFS. The hardware can “lie” to ZFS so a scrub can do more damage than good, possibly even permanently destroying your zpool.

Scrub just reads every block of the underlying vdevs and verifies their checksums. If your virtual disk does not cope with this, it is garbage and you should be concerned about it, not about ZFS. On the other hand, if your virtual disks are already checksummed on the SAN, your additional scrub will do nothing except cause additional I/O (it is useless).

  1. The second precaution is to disable any write caching that is happening on the SAN, NAS, or RAID controller itself. A write cache can easily confuse ZFS about what has or has not been written to disk. This confusion can result in catastrophic pool failures.

This is good advice if you don't trust the hardware. The downside is considerably lower performance, of course. You may also have no control ofer the SAN settings, so you need to treat it as a cheap disk you bought from ebay and slapped into your system - anything can happen, at least in theory.

  1. Using a single disk leaves you vulnerable to pool metadata corruption which could cause the loss of the pool. To avoid this, you need a minimum of three vdevs, either striped or in a RAIDZ configuration. Since ZFS pool metadata is mirrored between three vdevs if they are available, using a minimum of three vdevs to build your pool is safer than a single vdev. Ideally vdevs that have their own redundancy are preferred.

This is okay as general advice, but a bit of a nitpick. Assuming your SAN is bad, this will help you in certain cases (with much luck at least). Assuming your SAN is good, this does nothing and just costs you space and performance. It is much better in my opinion to make sure that the chain from physical disks to SAN to network to VM host to VM guest is equally good, so you don't have to do everything again in each layer.


FreeNAS vs others

A word about the FreeNAS recommendations - they are certainly okay as recommendations, that is, guidelines or tips for the general audience. If you follow them, you will not be worse off then otherwise, and might even be better off. Then again, they are stern-worded, as seems to be the usual tone in the FreeNAS community (judging from certain forum posters at least). I guess they just want to be on the safe side with that. I have always preferred the ZFS Best Practices guide, because it is worded pretty neutral and just presents facts, leaving it up to you to decide.

It's also interesting that according to the FreeNAS docs and forums, you will die a gruesome death if you dare to run a ZFS system for file services with less than a pitiful 4GB of RAM, while on the mailing lists of OmniOS (or SmartOS or illumos or Nexenta, I don't remember at the moment) people tested systems with 512MB of RAM and shared their suggestions how to configure them. All in all, it was more about knowledge of details and the choice was left to each person, instead of establishing rules that thou shalt follow.

Over time, this problem will also become less important and the recommendations will change, as more and more systems switch over to ZFS on normal desktop and server editions. Ubuntu has already done it, and others will surely follow. If in two or three years 80% of the distributions use ZFS or btrfs, most of them will run virtualized, so it is a moot point.