Rsync: locally cache files to be sent over slow link

rsync

This is regarding the incremental hard-link method of backups, making a daily directory containing a copy of hard links to the original files, and rsync deleting the hardlinks and placing new files for anything changed since the last backup.

I have to contend with a really slow remote rsync NetGear ReadyNAS Duo with a tiny brain and which can only handle about 2 megabit/sec SSH copy rates, while I have 50+ megabit bandwidth available. I don't want to have the program to be backed up, to be shut down for hours while backup data trickles to the slow remote NAS.

Is there a way to make a fast temporary local copy of the specific files to be rsynched, to minimize application/database downtime?

Apparently from what I can determine, the best way to minimize application downtime would be something like this:

  1. Stop the program/database to be backed up
  2. Run rsync against remote location, but only list files it would copy
  3. Use cp to copy those specific files/dirs to fast temporary local storage
  4. Restart the program/database to be backed up
  5. Rsync normally from the temp dir to the slow remote storage
  6. Delete the temporary local file copies

Is this the fastest way to cache the files to be rsynched remotely, or is there some other better method?

Is there a more automatic method to do this, that doesn't require all these separate scripted steps?

Best Answer

rsync doesn't have a feature to specifically help that situation. However you can fake it the following ways:

  • Stop the database, do a backup to the local disk, start the database. Use rsync to back up that local copy. Pros: Minimizes downtime. Cons: You'll need twice as much disk space locally.
  • Do two rsync runs. The first uses --exclude= and --include= flags to skip the database. The second first stops the database, then uses a different set of --exclude= and --include= flags to only backup the database files, then restarts the database. Pros: No local extra disk space needed. Cons: Longer downtime, more difficult to manage (constructing the right include/exclude flags is manual and error prone)

I would do the first option if at all possible. One thing that would make it easier is to have a very well-defined space that is backuped/not backuped to make the rsync command line more simple and easier to get right.

Related Topic