ZFS – How to Perform Incremental and Continuous Backups

backupincremental-backupzfszfsonlinux

How can zfs pools be continuously/incrementally backed up offsite?

I recognise the send/receive over ssh is one method however that involves having to manage snapshots manually.

There are some tools I have found however most are no longer supported.

The one tool which looks promising is https://github.com/jimsalterjrs/sanoid however I am worried that non-widely known tool may do more harm then good in that it may corrupt/delete data.

How are continuous/incremental zfs backups performed?

Best Answer

ZFS is an incredible filesystem and solves many of my local and shared data storage needs.

While, I do like the idea of clustered ZFS wherever possible, sometimes it's not practical, or I need some geographical separation of storage nodes.

One of the use cases I have is for high-performance replicated storage on Linux application servers. For example, I support a legacy software product that benefits from low-latency NVMe SSD drives for its data. The application has an application-level mirroring option that can replicate to a secondary server, but it's often inaccurate and is a 10-minute RPO.

I've solved this problem by having a secondary server (also running ZFS on similar or dissimilar hardware) that can be local, remote or both. By combining the three utilities detailed below, I've crafted a replication solution that gives me continuous replication, deep snapshot retention and flexible failover options.

zfs-auto-snapshot - https://github.com/zfsonlinux/zfs-auto-snapshot

Just a handy tool to enable periodic ZFS filesystem level snapshots. I typically run with the following schedule on production volumes:

# /etc/cron.d/zfs-auto-snapshot

PATH="/usr/bin:/bin:/usr/sbin:/sbin"

*/5 * * * * root /sbin/zfs-auto-snapshot -q -g --label=frequent --keep=24 //
00 * * * * root /sbin/zfs-auto-snapshot -q -g --label=hourly --keep=24 //
59 23 * * * root /sbin/zfs-auto-snapshot -q -g --label=daily --keep=14 //
59 23 * * 0 root /sbin/zfs-auto-snapshot -q -g --label=weekly --keep=4 //
00 00 1 * * root /sbin/zfs-auto-snapshot -q -g --label=monthly --keep=4 //

Syncoid (Sanoid) - https://github.com/jimsalterjrs/sanoid

This program can run ad-hoc snap/replication of a ZFS filesystem to a secondary target. I only use the syncoid portion of the product.

Assuming server1 and server2, simple command run from server2 to pull data from server1:

#!/bin/bash

/usr/local/bin/syncoid root@server1:vol1/data vol2/data

exit $?

Monit - https://mmonit.com/monit/

Monit is an extremely flexible job scheduler and execution manager. By default, it works on a 30-second interval, but I modify the config to use a 15-second base time cycle.

An example config that runs the above replication script every 15 seconds (1 cycle)

check program storagesync with path /usr/local/bin/run_storagesync.sh
        every 1 cycles
        if status != 0 then alert

This is simple to automate and add via configuration management. By wrapping the execution of the snapshot/replication in Monit, you get centralized status, job control and alerting (email, SNMP, custom script).


The result is that I have servers that have multiple months of monthly snapshots and many points of rollback and retention within: https://pastebin.com/zuNzgi0G - Plus, a continuous rolling 15-second atomic replica:

# monit status

Program 'storagesync'
  status                            Status ok
  monitoring status                 Monitored
  last started                      Wed, 05 Apr 2017 05:37:59
  last exit value                   0
  data collected                    Wed, 05 Apr 2017 05:37:59
.
.
.
Program 'storagesync'
  status                            Status ok
  monitoring status                 Monitored
  last started                      Wed, 05 Apr 2017 05:38:59
  last exit value                   0
  data collected                    Wed, 05 Apr 2017 05:38:59