Linux – Sparse files significantly larger (but still sparse) after copying over a network

linux

When attempting to copy sparse VM image files from one KVM hypervisor to another over the network, I see the following behavior:

  • The sparse files are still sparse files
  • The copied sparse files are significantly larger than the original sparse files

Source:

[root@kvm1 thin_images]# ls -lhs
total 2.6G
1.4G -rw-------. 1 root root 8.0G Jul 20 11:10 centos6-8g.img
1.3G -rw-------. 1 root root 8.0G Jul 20 10:50 debian7-8g.img

Destination:

[root@kvm2 thin_images]# ls -lhs
total 11G
4.8G -rw-------. 1 root root 8.0G Jul 20 11:10 centos6-8g.img
6.2G -rw-------. 1 root root 8.0G Jul 20 10:50 debian7-8g.img

As you can see, the sparse file for a CentOS image is now 4.8G instead of 1.4G. For the Debian image, it grew from 1.3G to 6.2G.

The methods I've tried for copying over the network include a dirty nc-tar pipe and rsync with –sparse and –inplace options. The hypervisors aren't on new enough Linux kernels to use the SEEK_HOLE functionality of bsdtar, nor do they have bsdtar itself.

Any explanation for this behavior? Is it possible for the destination sparse files to stay the same size as the original sparse files after copying them over a network?

Other info:

[root@kvm1 thin_images]# uname -a
Linux kvm1 2.6.32-504.23.4.el6.x86_64 #1 SMP Tue Jun 9 20:57:37 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@kvm1 thin_images]# yum list installed rsync tar nc
Loaded plugins: fastestmirror, security
Loading mirror speeds from cached hostfile
 * base: centos-mirror.jchost.net
 * extras: mirror.spro.net
 * updates: mirror.es.its.nyu.edu
Installed Packages
nc.x86_64                                                  1.84-22.el6                                                 @base                                   
rsync.x86_64                                               3.0.6-12.el6                                                @anaconda-CentOS-201410241409.x86_64/6.6
tar.x86_64                                                 2:1.23-11.el6                                               @anaconda-CentOS-201410241409.x86_64/6.6

Best Answer

rsync etc. will typically only sparse after a set number of bytes, and typically only on block sizes (Needs to read the source code, but I recall something about it based block sizes) to decide about the way to use the sparse methods. Thus, a block with a single byte written in it, would be copied and written, and thus the block size allocated, vs just a seek to that byte, and a seek to the rest. In the original file(s) it would be block sizes of 512bytes, but the transfers/etc. (for optimization) would be in like 64k block sizes. so then a single byte set in a 64kb get a 64kb written to disk, instead of the seeks to sparsify that "block".

You might see similar results by doing rsync even on the local filesystem of those images.

Have a look at these for post transfers: https://rwmj.wordpress.com/2010/10/19/tip-making-a-disk-image-sparse/ and http://blog.easter-eggs.org/index.php/post/2013/09/24/Convert-an-unsparse-vm-image-to-sparse-vm-image The advice in that link you've given also would then apply:

  1. rsync --sparse local dest://directory/
  2. use those tools to make it sparse again
  3. use rsync --inplace in all subsequent runs
  4. re-sparse files if they grow "too big" again
Related Topic