ZFS on single virtual disk

btrfsfile-servertruenaszfs

  • How do companies manage large file servers (e.g. 17 TB) and their related
    backups on a very tight-budget?
  • Is it ok to use ZFS (or BTRFS) on a single virtual disk for its copy-on-write nature to eliminate the need for fsck? (i.e. not for its RAID, snapshot, etc. features)

Specific situation:

We need to retire an ancient problematic storage system which was serving NFS storage for virtuals and is still our main file server. I now have a new 40TB iSCSI FreeNAS-based storage system and have already vMotion’ed all virtuals from old storage to new, but the 17 TB of SMB/CIFS & AFP shared files remain.

Backups are done via rsync which takes to long to scan 17 TB, so we’ve split into two volumes:

  • 13 TB on a read-only “archive” volume for files that haven’t been
    modified for a year. To backup we simply make onsite and offsite
    copies using rsync – no need for daily backup versions.
  • 4 TB live writable storage. This is backed up daily with rsync and uses hard-links to create “snapshots” / versions of the state of the
    file server on that particular day, so that we can recover
    overwritten files.

IT staff move files from archive to live when modifications are required, but these requests are now multiple times a day and no longer acceptable.

Original plan:

  • Create a virtual Linux file server.
  • Create 2x virtual disks on our new 40TB storage system. (13 TB archive + 4 TB writable live)
  • Format above disks with ext4 file systems
  • Use UnionFS with cow mount option to present a single unified view of above file systems to users, with all writes going to the 4TB writable system.

Problems with original plan:

  • We are outgrowing the file-based nature of rsync as a backup tool, e.g. renaming a 1TB top-level folder results in a complete new copy of the entire 1TB folder. This would only add a few KB to a block-based backup.
  • Implementing UnionFS just adds yet another layer of complication to the whole system, which I would really like to avoid.
  • Due to rsync having to rescan the whole file server to compare changed files, it takes a very long time to complete backups, even if only a few files were changed.
  • It requires us to do continuous archiving to keep writable volume small enough for backups to complete overnight.

Alternative plan 1: One large volume and ZFS snapshots for backups

  • Create planned Linux file server above, but with one large volume, instead of two, eliminating the need for UnionFS
  • To back up this very large server, which would take too long with rsync, use ZFS snapshots instead and use “zfs send” to replicate offsite.

Problem with alternative plan 1:

  • fsck on a 17 TB ext4 file system would take days! Imagine a Monday-morning incident requiring a restart and forced fsck at boot, or worse, a file system corruption!

Alternative plan 2: ZFS/BTRFS file system on Linux server

  • As per alternative plan 1, but use a copy-on-write file system such as ZFS or BTRFS instead of ext4, because these don’t require fsck.

Questions/concerns for alternative plan 2:

  • Both ZFS and BTRFS want direct access to raw disks to implement their own RAID and are therefore not normally used on virtual disks. How well will this work on a single virtual disk?

Alternative plan 3: FreeNAS as file server directly

  • Instead of having a separate virtual file server, share files from FreeNAS directly

Problem with alternative plan 3:

  • We need to install various packages and custom perl/bash/python scripts to integrate this file server with our job tracking system. This will be fine on straight Linux, but I don't think it's a good idea on a pre-packaged FreeBSD-based ZFS storage system (FreeNAS). Updates may overwrite our changes, etc.

Questions:

  • Is alternative plan 2 – ZFS on single virtual disk – a good idea? If not, why?
  • Can anyone suggest any better options?

Best Answer

How do companies manage large file servers (e.g. 17 TB) and their related backups on a very tight-budget?

It depends on the performance and budget constraints. For example, Backblaze (a cloud backup company) has a very tight budget for each TB, but they don't need top performance or response time for the data. Some other company might need cheap performance, but finds a way to reduce the data needed (with deduplication, pruning of old backup data or by simply reducing the actual data that may not be needed for the business).

Is it ok to use ZFS (or BTRFS) on a single virtual disk for its copy-on-write nature to eliminate the need for fsck? (i.e. not for its RAID, snapshot, etc. features)

I would not use BTRFS on anything non-development where you need to trust your data safety as first principle. I would use ZFS because I have not found a cheaper and safer alternative to it (other FS with similar features from IBM, NetApp etc. are more expensive, other free filesystems are either not mature enought (HAMMER2, BTRFS) or lack essential features (ext2/3/4, reiserfs, etc.).


Your specific answers: I would prefer plan 3, but plan 2 would also work.

Contrary to much FUD floating around the web, ZFS does not care about the underlying storage, whether it be physical, virtual or mixed. Of course, it can only be as good/fast/safe as the underlying storage and can only reason about what is given. This means, your performance will not be as good as native, your troubleshooting involves both layers and both layers can have problems. If you know this and provision for these downsides, I see it as a viable alternative.

You still have comfort features like send/recv, snapshots, CoW, checksums and block-level deduplication. What you sacrifice is mostly performance and possibly safety (if your SAN is just one single disk, for example). You should also tune your ZFS sector sizes (ashift) to your underlying storage when creating the pool. You can have different sizes for individual file systems afterwards, but the pool setting cannot be reversed without destroying it.

But I would first thoroughly evaluate whether the rewrite of those scripts and integrations is as time-consuming as you imagine. Also, even such appliances like FreeNAS (or napp-it, to name an alternative) have usually user-scripts or user-definable plugins or modules, which survive updates and work well with the appliance (the new FreeNAS 10 aka Corral replaced its plugin architecture with Docker, if that is something you are familiar with it might be an alternative).