ZFS – How to Estimate Send Size Larger Than Total Data Size

zfszfsonlinux

From zfs send -R -v pool/fs@snap:

send from @ to pool/fs@snap estimated size is 6.50T

…but from zpool list:

NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
pool  3.62T  2.36T  1.27T    65%  2.87x  ONLINE  -

Can a zfs send stream really be several times larger than the pool from which it's taken?

Observed w/ ZFS on Linux 0.6.1.

Best Answer

As tegbains pointed out in a comment, zfs send streams do not benefit from any storage-level deduplication in place. They also don't benefit from any other settings; this is why zfs send | zfs receive can be used to migrate data to new settings that otherwise would only take effect once the data is rewritten -- such a enabling or disabling deduplication, or changing compression algorithms.

This is the major reason why your zfs send stream becomes so much bigger than the allocated storage space. A likely reason for this in the specific case of deduplication, beyond principle of least surprise (if you need one), is that deduplication (especially in ZFS) is very costly, and a decision was made that zfs send streams should be receivable on lower-spec'd systems.

Your data shows about 2.36 TB allocated, with an overall deduplication ratio of 2.87x. Naiively multiplying these two numbers yields 6.77 TB, which is close enough to the estimated 6.50 TB to be a reasonable ballpark figure. It's certainly worth noting that the 6.50 TB figure relates to a snapshot in the file system, whereas the 2.36TB*2.87 figure relates to the entire pool.

If your ZFS implementation supports that option, you may have some luck with zfs send -D (generate deduplicated zfs send stream).

Observed w/ ZFS on Linux 0.6.1.

Not directly related to your question, but I would suggest upgrading. Stable ZoL is at 0.6.4.1 as of this writing (June 2015), and there have been numerous both enhancements and fixes since 0.6.1 came out in March 2013.

Related Solutions

Freebsd – Reducing ZFS stream size for offsite backup

I know this is a really old question, but I've seen it a few difference places. There's always been some confusion about the value expressed in zfs list as it pertains to using zfs send|recv. The problem is that the value expressed in the USED column is actually an estimate of the amount of space that will be released if that single snapshot is deleted, bearing in mind that there may be earlier and later snapshots referencing the same data blocks.

Example:

zfs list -t snapshot -r montreve/cev-prod | grep 02-21
NAME                                      USED  AVAIL  REFER  MOUNTPOINT
montreve/cev-prod@2018-02-21_00-00-01     878K      -   514G  -
montreve/cev-prod@2018-02-21_sc-daily     907K      -   514G  -
montreve/cev-prod@2018-02-21_01-00-01    96.3M      -   514G  -
montreve/cev-prod@2018-02-21_02-00-01    78.5M      -   514G  -
montreve/cev-prod@2018-02-21_03-00-01    80.3M      -   514G  -
montreve/cev-prod@2018-02-21_04-00-01    84.0M      -   514G  -
montreve/cev-prod@2018-02-21_05-00-01    84.2M      -   514G  -
montreve/cev-prod@2018-02-21_06-00-01    86.7M      -   514G  -
montreve/cev-prod@2018-02-21_07-00-01    94.3M      -   514G  -
montreve/cev-prod@2018-02-21_08-00-01     101M      -   514G  -
montreve/cev-prod@2018-02-21_09-00-01     124M      -   514G  -

In order find out how much data will need to be transferred to reconstitute a snapshot via zfs send|recv, you'll need to use the dry-run feature (-n) for these values. Taking the above-listed snapshots try:

zfs send -nv -I montreve/cev-prod@2018-02-21_00-00-01 montreve/cev-prod@2018-02-21_09-00-01
send from @2018-02-21_00-00-01 to montreve/cev-prod@2018-02-21_sc-daily estimated size is 1.99M
send from @2018-02-21_sc-daily to montreve/cev-prod@2018-02-21_01-00-01 estimated size is 624M
send from @2018-02-21_01-00-01 to montreve/cev-prod@2018-02-21_02-00-01 estimated size is 662M
send from @2018-02-21_02-00-01 to montreve/cev-prod@2018-02-21_03-00-01 estimated size is 860M
send from @2018-02-21_03-00-01 to montreve/cev-prod@2018-02-21_04-00-01 estimated size is 615M
send from @2018-02-21_04-00-01 to montreve/cev-prod@2018-02-21_05-00-01 estimated size is 821M
send from @2018-02-21_05-00-01 to montreve/cev-prod@2018-02-21_06-00-01 estimated size is 515M
send from @2018-02-21_06-00-01 to montreve/cev-prod@2018-02-21_07-00-01 estimated size is 755M
send from @2018-02-21_07-00-01 to montreve/cev-prod@2018-02-21_08-00-01 estimated size is 567M
send from @2018-02-21_08-00-01 to montreve/cev-prod@2018-02-21_09-00-01 estimated size is 687M
total estimated size is 5.96G

Yikes! That's a whole heck of a lot more than the USED values. However, if you don't need all of the intermediary snapshots at the destination, you can use the consolidate option (-i rather than -I), which will calculate the necessary differential between any two snapshots even if there others in between.

zfs send -nv -i montreve/cev-prod@2018-02-21_00-00-01 montreve/cev-prod@2018-02-21_09-00-01
send from @2018-02-21_00-00-01 to montreve/cev-prod@2018-02-21_09-00-01 estimated size is 3.29G
total estimated size is 3.29G

So that's isolating the various blocks that were rewritten between snapshots, so we only take their final state.

But that's not the whole story! zfs send is based on extracting the logical data from the source, so that if you have compression activated on the source filesystem, the estimates are based on the uncompressed data that will need to be sent. For example, taking one incremental snapshot and writing it to disk you get something close to the estimated value from the dry-run command:

zfs send -i montreve/cev-prod@2018-02-21_08-00-01 montreve/cev-prod@2018-02-21_09-00-01 > /montreve/temp/cp08-09.snap
-rw-r--r--  1 root root    682M Feb 22 10:07 cp08-09.snap

But if you pass it through gzip, we see that the data is significantly compressed:

zfs send -i montreve/cev-prod@2018-02-21_08-00-01 montreve/cev-prod@2018-02-21_09-00-01 | gzip > /montreve/temp/cp08-09.gz
-rw-r--r--  1 root root    201M Feb 22 10:08 cp08-09.gz

Side note - this is based on the OpenZFS on Linux, version : - ZFS: Loaded module v0.6.5.6-0ubuntu16

You will find some references to optimisations that can be applied to the send stream (-D deduplicated stream, -e more compact), but with this version I haven't observed any impact on the size of the streams generated with my datasets.

ZFS on FreeBSD – Recovery from Data Corruption

The problem was that the new motherboard's BIOS created a host protected area (HPA) on some of the drives, a small section used by OEMs for system recovery purposes, usually located at the end of the harddrive.

ZFS maintains 4 labels with partition meta information and the HPA prevents ZFS from seeing the upper two.

Solution: Boot Linux, use hdparm to inspect and remove the HPA. Be very careful, this can easily destroy your data for good. Consult the article and the hdparm man page (parameter -N) for details.

The problem did not only occur with the new motherboard, I had a similar issue when connecting the drives to an SAS controller card. The solution is the same.

Best Answer

Related Solutions

Freebsd – Reducing ZFS stream size for offsite backup

ZFS on FreeBSD – Recovery from Data Corruption

Related Topic