Git-style incremental backup with rsync

backuprsyncversion control

I'm trying to setup a backup script on Ubuntu. Every day I want to copy my local source directory to a backup directory on a remote server uniquely named with the date. (e.g., backup-jan1/, backup-jan2/, etc) It should store a mirror of the earliest state and use difference files to recreate the new backup points.

This is pretty simple with rsync. I've already setup a script that will make the backup, name the backup directory with the current day, and make a symlink to the most recent backup (IP has been edited):

date=`date "+%m%d"`
rsync -ave ssh /srv root@150.69.32.8:/backup/backup-$date/
ssh root@150.69.32.8 rm -rf /backup/current
ssh root@150.69.32.8 ln -s backup-$date/ /backup/current

However, here's the tricky part: I don't want it to copy files that have not changed. So, if any files have changed since the last daily backup it will copy them, like normal. Otherwise, it will symlink unchanged, previously backed-up files from their first backup directory to the new backup. (Kind of like git)

So, for example, let's say I start the backup Jan 1. The backup-jan1/ directory will contain all the original backup files. The next day the Jan 2 backup should then copy just the files changed in that 24 hours. For all other files, it will make symlinks from the Jan 1 backup files. On Jan 3, I added a file and delete another. If a file is removed, it should not continue be symlinked.

Example directory/file structure:

backup-jan1/ (initial backup)
    file_a
    file_b

backup-jan2/ (no changes)
    file_a (symlink to ../backup-jan1/file_a)
    file_b (symlink to ../backup-jan1/file_b)

backup-jan3/ (removed file_a symlink and added file_c)
    file_b (symlink to ../backup-jan1/file_b)
    file_c

...

I've tried to look for this version-control type functionality in rsync and rsnapshot, but I haven't found it yet. Can anyone suggest a backup strategy like this?

Best Answer

What you seem to be looking for is the --link-dest functionality that is part of rsync. What you seem to describe is exactly how dirvish operates.

The link-dest option creates hard-links from the destination path to another copy of the structure.

With dirvish you perform an initial backup, which just uses rsync.

After that each additional back, is hard-linked to the previous successful backup. Meaning there is no duplication of files. You can directly access any single backup from within the vault, and each backup is a complete and full backup. You can remove previous backups at any time.

Here is a script that you can use to demonstrate.

# create test area
mkdir -p /tmp/backuptest/{source,dest1,dest2,dest3}
for a in `seq 10` ; do dd if=/dev/urandom of=/tmp/backuptest/source/file$a bs=1M count=1; done

# look
find /tmp/backuptest/ -ls ; du find /tmp/backuptest/

# initial backup
rsync -va /tmp/backuptest/source/ /tmp/backuptest/dest1/

# look
find /tmp/backuptest/ -ls ; du find /tmp/backuptest/

# make chagnes
rm /tmp/backuptest/source/file[2-4]
cat /tmp/backuptest/source/file[6-7] >/tmp/backuptest/source/file11

# new backup linked to previous
rsync -va /tmp/backuptest/source/ /tmp/backuptest/dest2/ --link-dest=/tmp/backuptest/dest1/

# look
find /tmp/backuptest/ -ls ; du find /tmp/backuptest/

# make changes
rm /tmp/backuptest/source/file5
cat /tmp/backuptest/source/file[5-7] >/tmp/backuptest/source/file12

# new backup linked to previous
rsync -va /tmp/backuptest/source/ /tmp/backuptest/dest3/ --link-dest=/tmp/backuptest/dest2/

# look
find /tmp/backuptest/ -ls ; du find /tmp/backuptest/

# remove dest1
rm -r /tmp/backuptest/dest1/

# see your dest2, and dest3 are still complete backups for the state at those times.
find /tmp/backuptest/ -ls ; du find /tmp/backuptest/
Related Topic