How can zfs pools be continuously/incrementally backed up offsite?
I recognise the send/receive
over ssh is one method however that involves having to manage snapshots manually.
There are some tools I have found however most are no longer supported.
The one tool which looks promising is https://github.com/jimsalterjrs/sanoid however I am worried that non-widely known tool may do more harm then good in that it may corrupt/delete data.
How are continuous/incremental zfs backups performed?
Best Answer
ZFS is an incredible filesystem and solves many of my local and shared data storage needs.
While, I do like the idea of clustered ZFS wherever possible, sometimes it's not practical, or I need some geographical separation of storage nodes.
One of the use cases I have is for high-performance replicated storage on Linux application servers. For example, I support a legacy software product that benefits from low-latency NVMe SSD drives for its data. The application has an application-level mirroring option that can replicate to a secondary server, but it's often inaccurate and is a 10-minute RPO.
I've solved this problem by having a secondary server (also running ZFS on similar or dissimilar hardware) that can be local, remote or both. By combining the three utilities detailed below, I've crafted a replication solution that gives me continuous replication, deep snapshot retention and flexible failover options.
zfs-auto-snapshot - https://github.com/zfsonlinux/zfs-auto-snapshot
Just a handy tool to enable periodic ZFS filesystem level snapshots. I typically run with the following schedule on production volumes:
Syncoid (Sanoid) - https://github.com/jimsalterjrs/sanoid
This program can run ad-hoc snap/replication of a ZFS filesystem to a secondary target. I only use the syncoid portion of the product.
Assuming server1 and server2, simple command run from server2 to pull data from server1:
Monit - https://mmonit.com/monit/
Monit is an extremely flexible job scheduler and execution manager. By default, it works on a 30-second interval, but I modify the config to use a 15-second base time cycle.
An example config that runs the above replication script every 15 seconds (1 cycle)
This is simple to automate and add via configuration management. By wrapping the execution of the snapshot/replication in Monit, you get centralized status, job control and alerting (email, SNMP, custom script).
The result is that I have servers that have multiple months of monthly snapshots and many points of rollback and retention within: https://pastebin.com/zuNzgi0G - Plus, a continuous rolling 15-second atomic replica:
# monit status