Using rsync to quickly upload a file that is similar to another file

deploymentfile-transferrsync

I'm putting together a deployment script which tars up a directory of my code, names the tar file after the current date and time, pushes that up to the server, untars it in a directory of the same name and then swaps a "current" symlink to point at the new directory. This means my older deployments stay around in timestamped directories (at least until I delete them).

The tar file is around 5MB and it takes nearly a minute to transfer. I'd like to speed this up.

I assume each new tarball is pretty similar in structure to the previous tarball (since I'm often only changing a few lines of source code in between deployments). Is there a way to take advantage of this fact to speed up my uploads using rsync?

Ideally I'd like to say "hey rsync, upload this local file called 2009-10-28-222403.tar.gz to my server, but it's only a tiny bit different from the file 2009-10-27-101155.tar.gz which is already up there, so try to just send over the differences". Is this possible, or is there another tool I should be looking at?

Best Answer

I'm putting together a deployment script which tars up a directory of my code, names the tar file after the current date and time, pushes that up to the server, untars it in a directory of the same name and then swaps a "current" symlink to point at the new directory.

Personally, I think you should skip using tar, and instead look at using the --link-dest or --copy-dest feature of rsync. The link-dest function is pretty cool it will know to look at the previous sync of the directory, and if the files where identical it will hardlink them together skipping the need to retransfer the file each time.

mkdir -p /srv/codebackup/2009-10-12 \
         /srv/codebackup/2009-10-13

# first backup on 10-12
rsync -a sourcehost:/sourcepath/ \
         /srv/codebackup/2009-10-12/

# second backup made on 10-13
rsync -a --link-dest=/srv/codebackup/2009-10-12/
         sourcehost:/sourcepath/ \
         /srv/codebackup/2009-10-13/

Your second run of rsync will only transfer changed files. Identical files will be hard linked together. You can delete the older tree and the new backup will still be 100% complete. You will save a lot of storage space since you will not be keeping multiple copies of identical files.