You are looking for duplicity.
Duplicity creates an encrypted, compressed backup of the input data on the client and transfers it via librsync / ssh. It creates incremental backups, so that it can still transfer the delta with minimum bandwidth consumption despite using encryption. The nice side effect is that you can do daily backups and still access the version of n days ago.
Summary: rsync
should be perfectly fine for backing up an svn repository, as long as you are not backing up a repository that is currently active. I suspect that you are trying to backup an active repository which is problematical.
Detail:
You don't say what errors are reported, which makes any attempt at diagnosis difficult. This is something I regularly moan about our users for - if an application gives you a specific message report that specific message to the people you are asking for diagnostics/support from, even if the message is in fact "an error occurred" or similar (as this does happen).
I'm guessing that the problems being reported are relating to files going missing (they were present during the initial scan but moved/renamed/deleted before that backup was complete), being locked, or apparently changed while rsync was reading them. You will see similar errors (or much worse: related but unreported problems) with most backup techniques if backing up a live svn service and you do not completely stop the svn service before starting the backup run.
Stopping all access to the repository while the backup run takes place may not be on option for you even if it is done in the dead of night (as you might have remote developers who work at different hours). If this is the case then there are a few options, including:
Use hot-backup.py
to do a full backup of the repository while it is live as described in this section of the freely available Version Control with Subversion which is generally considered recommended reading. This will not be suitable directly for your remote backup as it will result in the full repo being sent over the line each time, but you can do the backup to a temporary local area and perform the rsync
(or anything else) based backup on that rather than the live repository.
If you are running on Linux and use LVM for your drive partitioning you could use LVM's snapshot facility to perform a similar feat as described in option 1. See here and here for example documentation of the technique. This does mean stopping access to the SNV service for a short while, for the length of time that the snapshot takes to be created, but this is near instant so much less likely to be an issue than needing to stop it for the whole backup operation.
Use incremental backups of the live repository, also mentioned in the above SVN book.
The LVM technique will be faster than hot-backup.py
-then-sync, but is not available to you without a chunk of extra work and learning unless you already use and are familiar with LVM. Its advantages are that it will be almost certainly be significantly quicker, and will use less disk space (though disk space is pretty cheaply available these days). LVM snapshots do affect write performance while they are present, but the difference is unlikely to be noticeable unless your repository is very very busy and performance will return to normal anyway at the end of the backup run when you drop the snapshot.
The hot-backup.py
method has the advantage of giving you a local backup too if you don't already have one - if you store the "hot copy" version on another machine you can restore this much more quickly than you can restore the remote copy if the primary machine dies in an event that doesn't affect the other (a drive controller failure, for example). It is also likely to be simpler to implement, unless you already use LVM and are familiar with it.
Incremental backups will be faster than both of these techniques, but less simple than hotcopy-then-sync and restoration after a complete disaster is potentially more complex unless you use the incremental backups to build a full repo copy at the other end (rather than just storing the incremental information). Rebuilding the repo at the other end is recommended anyway though, as this is a way of testing that your backup is in fact valid - even with the other techniques you should test your backups regularly (mantra: a backup is not a good backup unless it has been tested).
In summary, rsync
should be perfectly fine for backing up an svn repository (as would many other techniques but I'm quite a fan of rsync
in most use cases myself) as long as you are not backing up a repository that is currently active - you need to stop the service or backup from some form of snapshot.
Best Answer
I would recommend using rsync+ssh for security reasons. You can use either
pull
orpush
backup. For instance, if you decide to use pull based backup, first generate ssh key on the remote server. You will then pull the files to the remote server from the original backup server.Remote('another') server: generate private/public key
Take the public key generated, say
/root/.ssh/id_rsa.pub
, to the Backup server.Backup server - add public key of remote server to the authorized_keys of the backup server.
Remote server -Test public key ssh connection
Add the following command to your crontab
You can change the username, hostnames, private/public key file names, directories etc. based on your setup.