How backup a distributed file system

backupdistributed-filesystems

Note: This is a "theoretical" question, as I haven't got that kind of data yet.

If you have a distributed file system spanning a dozen or more servers, and TBs of data, how do you perform backups of that? Local tape drives aren't an option as I am renting the server and don't have physical access to them. The way I see it, I simply must have a backup cluster that is proportional in size to the source cluster. Sending all of that data over the network in parallel would probably saturate it, bringing the throughput down. But the backups all have to be taken at the same time, so doing round robin backups doesn't seem to make sense. One way around this problem would be to just reserve a small portion of the large (in my case) drives, and keep the rest for rotating local LVM snapshots. Unfortunately, that kind of backup would be useless if the server get compromised. Are there any other options to create a point-in-time backup that doesn't kill the network?

[EDIT] SOLUTION:

1) Replicate the entire data set in (near) real time to one large local backup server, so the bandwidth usage, and IO, is spread over the day, and local bandwidth is usually "free".

2) Create the real backup off that machine and send it off site. If you have all the data together, it should be easy to do a differential backup, which saves billable bandwidth.

Best Answer

If you find that you have more data that you can copy in your backup window - then you need to look into replication your entire data set off site in real time, or as close as you can get it, using separate infrastructure. (different subnets, VLAN, different pipe to the outside work etc)

I would use iSCSI, in fact specifically, I would use openfiler to have my backend data replicated to the outside world, plus the other goodies that you can get with openfiler.

Failing that, I would locally use DRDB, (assuming linux) and replicated that to a few other servers, and then run my backups off them.


The best advice I can offer people, is to separate their critical data and make sure it's copied to redundant disk space, like a SAN, or very least NAS. That way you can pretty much deploy any local backup mechanisms you want, knowing your safe because your critical data replicates offsite anyway. It's a pain, and management may not agree at first, but ask them to do the figures on how much the will lose in downtime of a week, you'll find that your budget will miraculously increase!