Linux – Store multiple versions of large binary file with minimal data duplication (preferably Linux)

backupdeduplicationfilesystemslinuxstorage

I need to store multiple versions of a ~ 150 GB binary file (qcow2) on Linux servers with local storage, and was hoping there is some solution that involves just keeping diffs that can be merged as needed, so that I dont have to create another copy of A 150 GB file when only 4 Gigs have changed. This is a storage question, not a question about KVM/qcow2 specific features. I have already explored some of those options. Currently using CentOS 6.3 with EXT4. The files will need to be stored indefinitely and must be completely intact when restored. I am willing to change filesystem etc if a solution is worth it.

Best Answer

ZFS on Linux with deduplication may be your friend in this case. There are Red Hat RPMs/repos available for installation.

Even without dedupe, if you can work this into the ZFS snapshotting workflow, there are some significant advantages to attempting this with ZFS.

Can you explain a bit more about how you wish to work with these files? Are you seeking point-in-time snapshots, or copying multiple revisions of the same/similar files to the datastore?

Best Answer

Related Solutions

Linux Command – How to Find Strings in Binary or Non-ASCII File

File System with Real-Time Data Deduplication

Related Topic