Linux – Building raw disk images: best way to delete files for compression

disk-imagefilesystemslinux

I'm building raw disk images (ie, dd, chroot to install linux). During the customization process I may delete files, use temporary files, etc.

What is the best way to delete these files to ensure the image is most compressible?

I'm assuming if I simply rm the file, it's just deleting records from the FAT to mark the blocks as available. This leaves the data in place, so when I gzip or bzip2 the image it still has to pack that data up. I assume things would be a lot tighter if I could tell the FS to write zeros to the blocks instead.

A bit of detail: these are CentOS 6.4 installs on ext4, but I would expect the answer applies to most linux distros using most file systems. The base filesystem I generate is via a command like dd if=/dev/zero of=filesystem.image bs=1M count=10240. A typical 10GB disk image from a vanilla install will compress down to roughly 500MB. I bet if I did a more aggressive cleanup of temp files and such, I could get it a lot tighter.

Thanks!

Best Answer

Zoredache's comment got me on the right track (see How to zero fill a virtual disk's free space on windows for better compression?).

As far as I can tell, zerofree homepage is http://intgat.tigress.co.uk/rmy/uml/index.html. There are two things hosted here: the zerofree tool and a kernel patch. The kernel patch does not work with ext4, but adds the ability to set a mount flag to zero out files on delete. The zerofree program does work on ext4.

There are no zerofree CentOS 6 RPM's available, but I was able to compile it on CentOS6 using the CentOS5 srpm.