I run 2 LAMP web servers at different providers for disaster recovery purposes – a high powered live server, and a low powered backup server.
Currently I rsync all the data from the live server to the backup server once every 4 hours.
This works ok, but does spike system load whilst rsync figures out which files have changed.
Since all the websites also live in git repositories, I'm wondering whether a git push would be a better backup technique.
I'd have to include the live uploads folder in the git repo; and then the backup process would be:
live$ git add .
live$ git commit -a -m "{data-time} snapshot"
live$ git push backup live_branch
and then have a post commit hook on the backup server to checkout on every push.
Each website ranges in size from 50M to 2GB. I'd end up with about 50 separate git repos.
Is this a "better" solution than rsync?
- Is git better at calculating which files have changed?
- Is git push more efficient that rsync
- What have I forgotten?
Thanks!
—- Data from some comparison tests ——
1) 52MB folder then adding a new 500k folder (mainly text files)
rsync
sent 1.47K bytes received 285.91K bytes
total size is 44.03M speedup is 153.22
real 0m0.718s user 0m0.044s sys 0m0.084s
git
Counting objects: 38, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (37/37), done.
Writing objects: 100% (37/37), 118.47 KiB, done.
Total 37 (delta 3), reused 0 (delta 0)
real 0m0.074s user 0m0.029s sys 0m0.045s
2) 1.4G folder then adding a new 18M folder (mainly images)
rsync
sent 3.65K bytes received 18.90M bytes
total size is 1.42G speedup is 75.17
real 0m5.311s user 0m0.784s sys 0m0.328s
git
Counting objects: 108, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (106/106), done.
Writing objects: 100% (107/107), 17.34 MiB | 5.21 MiB/s, done.
Total 107 (delta 0), reused 0 (delta 0)
real 0m15.334s user 0m5.202s sys 0m1.040s
3) 52M folder then adding a new 18M folder (mainly images)
rsync
sent 2.46K bytes received 18.27M bytes 4.06M bytes/sec
total size is 62.38M speedup is 3.41
real 0m4.124s user 0m0.640s sys 0m0.188s
git
Counting objects: 108, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (106/106), done.
Writing objects: 100% (107/107), 17.34 MiB | 5.43 MiB/s, done.
Total 107 (delta 1), reused 0 (delta 0)
real 0m6.990s user 0m4.868s sys 0m0.573s
4) 1.4G folder then adding a new 500k folder (mainly text)
rsync
sent 2.66K bytes received 916.04K bytes 612.47K bytes/sec
total size is 1.42G speedup is 1547.14
real 0m1.191s user 0m0.180s sys 0m0.268s
git
Counting objects: 49, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (48/48), done.
Writing objects: 100% (48/48), 177.90 KiB, done.
Total 48 (delta 3), reused 0 (delta 0)
real 0m1.776s user 0m0.390s sys 0m0.497s
5) 1.4G folder – no change
rsync
sent 1.72K bytes received 716.44K bytes 287.26K bytes/sec
total size is 1.42G speedup is 1979.18
real 0m1.092s user 0m0.168s sys 0m0.272s
git
nothing to commit (working directory clean)
real 0m0.636s user 0m0.268s sys 0m0.348s
5) 52M folder – no change
rsync
sent 528 bytes received 88.40K bytes 59.29K bytes/sec
total size is 62.38M speedup is 701.41
real 0m0.779s user 0m0.044s sys 0m0.144s
git
nothing to commit (working directory clean)
real 0m0.156s user 0m0.057s sys 0m0.097s
Best Answer
Actually I would suggest using a balanced mix of both. Your main backup should be committed (at least) every night to git. Sync it once or twice a week to another machine which is kept way far from the production box using rsync.
Git will help you with immediate recovery and it also makes analysis of data easier owing to the fact that you backup is version-ed and has a changelog. After any major change to the data, you can do a commit and push to git manually and put the reason in changelog. In case git goes bad then rsync will come to the rescue but keep in mind that you'll still loose data depending upon the frequency of rsync.
Rule of thumb: when it comes to backups and disaster recovery, nothing can guarantee to give you 100% recovery.