I'm trying to setup a backup script on Ubuntu. Every day I want to copy my local source directory to a backup directory on a remote server uniquely named with the date. (e.g., backup-jan1/, backup-jan2/, etc) It should store a mirror of the earliest state and use difference files to recreate the new backup points.
This is pretty simple with rsync. I've already setup a script that will make the backup, name the backup directory with the current day, and make a symlink to the most recent backup (IP has been edited):
date=`date "+%m%d"`
rsync -ave ssh /srv root@150.69.32.8:/backup/backup-$date/
ssh root@150.69.32.8 rm -rf /backup/current
ssh root@150.69.32.8 ln -s backup-$date/ /backup/current
However, here's the tricky part: I don't want it to copy files that have not changed. So, if any files have changed since the last daily backup it will copy them, like normal. Otherwise, it will symlink unchanged, previously backed-up files from their first backup directory to the new backup. (Kind of like git)
So, for example, let's say I start the backup Jan 1. The backup-jan1/ directory will contain all the original backup files. The next day the Jan 2 backup should then copy just the files changed in that 24 hours. For all other files, it will make symlinks from the Jan 1 backup files. On Jan 3, I added a file and delete another. If a file is removed, it should not continue be symlinked.
Example directory/file structure:
backup-jan1/ (initial backup)
file_a
file_b
backup-jan2/ (no changes)
file_a (symlink to ../backup-jan1/file_a)
file_b (symlink to ../backup-jan1/file_b)
backup-jan3/ (removed file_a symlink and added file_c)
file_b (symlink to ../backup-jan1/file_b)
file_c
...
I've tried to look for this version-control type functionality in rsync and rsnapshot, but I haven't found it yet. Can anyone suggest a backup strategy like this?
Best Answer
What you seem to be looking for is the
--link-dest
functionality that is part of rsync. What you seem to describe is exactly how dirvish operates.The link-dest option creates hard-links from the destination path to another copy of the structure.
With dirvish you perform an initial backup, which just uses rsync.
After that each additional back, is hard-linked to the previous successful backup. Meaning there is no duplication of files. You can directly access any single backup from within the vault, and each backup is a complete and full backup. You can remove previous backups at any time.
Here is a script that you can use to demonstrate.