ZFS – How to Estimate Send Size Larger Than Total Data Size

zfszfsonlinux

From zfs send -R -v pool/fs@snap:

send from @ to pool/fs@snap estimated size is 6.50T

…but from zpool list:

NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
pool  3.62T  2.36T  1.27T    65%  2.87x  ONLINE  -

Can a zfs send stream really be several times larger than the pool from which it's taken?


Observed w/ ZFS on Linux 0.6.1.

Best Answer

As tegbains pointed out in a comment, zfs send streams do not benefit from any storage-level deduplication in place. They also don't benefit from any other settings; this is why zfs send | zfs receive can be used to migrate data to new settings that otherwise would only take effect once the data is rewritten -- such a enabling or disabling deduplication, or changing compression algorithms.

This is the major reason why your zfs send stream becomes so much bigger than the allocated storage space. A likely reason for this in the specific case of deduplication, beyond principle of least surprise (if you need one), is that deduplication (especially in ZFS) is very costly, and a decision was made that zfs send streams should be receivable on lower-spec'd systems.

Your data shows about 2.36 TB allocated, with an overall deduplication ratio of 2.87x. Naiively multiplying these two numbers yields 6.77 TB, which is close enough to the estimated 6.50 TB to be a reasonable ballpark figure. It's certainly worth noting that the 6.50 TB figure relates to a snapshot in the file system, whereas the 2.36TB*2.87 figure relates to the entire pool.

If your ZFS implementation supports that option, you may have some luck with zfs send -D (generate deduplicated zfs send stream).

Observed w/ ZFS on Linux 0.6.1.

Not directly related to your question, but I would suggest upgrading. Stable ZoL is at 0.6.4.1 as of this writing (June 2015), and there have been numerous both enhancements and fixes since 0.6.1 came out in March 2013.