Linux – Backup of machines running linux and windows

backuplinuxrsyncwindows

I have a problem concerning backup.
I have a network (constructed using Wireless 150Mbps and Gigabit ethernet) consisting in at least 3 computers (plus maybe 2 in remote).

I have a plan to construct a linux server {pretty powerful} which will do (pretty much):

  • Media center (recording / playback)
  • FTP server to serve files on my network
  • Other servers to developing applications (MySQL, apache,… )
  • BACKUPS

Concerning the BACKUPS aspect, the machines to be backupped are running:

  • 3 x Linux >=2.6.30 (Gentoo and Archlinux)
  • 1 x Windows XP 32 bits
  • 3 x Windows 7 64 bits
  • 1 x Windows 7 32 bits

The backup might be performed using smb file share {I'm not really lucky with it nowadays} / rsync / svn / tar / or anything else or combined you might suggest.
The fonctionnalities are (in order of priority):

  • Revisions (SVN-style): a file has to be backed up each time it gets modified (and multiples versions of the same file can exist on the server, in fact they must)
  • Scalability: if I attach an USB drive to the computer, I want it's data to be backed up as well (well… That on linux might be quite easy, simply backup all /media/ except cds and dvds, but for windows? )
  • Near real-time (~ 5 minutes at max) file backup: I lost a latex report and was hard to reconstruct it from scratch
  • No-Duplication: for instance if I backup the USB's disk content from 2 differents computers, I do not want the data to be backed up twice (symlink instead of hard copy in worst case)
  • Manual restore / automatic restore: it's the same for me (simply not like described here below)
  • I do not want to look in 1000 folders to find each time the same directory structure in which there are only 10 files (I prefer to look in ONE directory to find all the latest files in the File System structure, like /media/BACKUPS/PC01/home//… )
  • Maybe ability to remove / exclude large files from backups
  • Good logs

Server specs:

  • 2 x 2TB hard disc space used for backups (in fact 1 is used for backups, the other one will be rsynced from the first one {I prefer not RAID 1}, just in case… )
  • 4 to 8GB RAM DDR3
  • At least 4 cores (AMD Athalon II x4 640 @ 3.0 GHz) -> upgradaeble to Bulldozer later

What I had already considered (might consider it again if you point out some interesting caracteristic):

  • Backuppc
  • Rsync (problem: no file versioning, windows client might be buggy)
  • SVN (problem: 2 x overhead – files are copied twice, thus 2 x file disk usage)
  • Amanda backup / Bacula (not really understood what they can and can't do)

I know a bit of BASH and Python programming on the server side.
I might eventually even make up a web interface using apache / php / MySQL.
All I need to know is the best components to use to achieve this (i.e. which backup software on the server, which protocol, which client, which caracteristics to implement accordingly).

Best Answer

You can do very well with Bacula/Amanda. Hitting your requirements:

Revisions (SVN-style): a file has to be backed up each time it gets modified (and multiples versions of the same file can exist on the server, in fact they must)
Bacula and Amanda will grab a file each time it changes.

Scalability: if I attach an USB drive to the computer, I want it's data to be backed up as well (well... That on linux might be quite easy, simply backup all /media/ except cds and dvds, but for windows?)
Not bad on Unix (Just back up everything under / and it will grab the media), but probably not possible on Windows -- I believe you need to specify the drives you want to grab because the filesystem isn't a tree hierarchy under a specific root (there's a root for each drive).
That said, it's probably NOT a good idea (What if you attach a full 1TB drive to a machine being backed up? Your backups just ballooned).

Near real-time (~ 5 minutes at max) file backup: I lost a latex report and was hard to reconstruct it from scratch
Not happening -- You CAN specify a 5 minute backup window, but your logs will be filled with jobs being killed because there's already a duplicate running.
You can schedule nightly backups, or even every 12 hours without much trouble.
(Even Apple's Time Machine only does hourly backups... think about the largest file that may change and have to be shoved over the wire...)

No-Duplication: for instance if I backup the USB's disk content from 2 different computers, I do not want the data to be backed up twice (symlink instead of hard copy in worst case)
Bacula doesn't have deduplication at this time. Not sure about Amanda.

Manual restore / automatic restore: it's the same for me (simply not like described here below) Restores are (and should be) a manual process. I have no idea what an "automatic restore" would look like (the backup server decides on its own to restore a file? :)

Maybe ability to remove / exclude large files from backups
You can include or exclude specific parts of the filesystem (down to file-level granularity) in Bacula.

Good logs
Database-backed lists of jobs and results, with the ability to write to log files, email, etc. in the event of errors.


BackupPC may also be able to hit these requirements (not certain - haven't used it) - other commercial backup solutions almost certainly can as well.
You may also want to consider tarsnap, though I'm not sure how the Windows support is.

Related Topic