Duplicity Full Backup Lifetime and Efficiency

backupduplicityincremental-backup

I'm trying to work up a backup strategy for some clients, and am leaning towards duplicity for remote backup (already use rdiff-backup for internal/on location backups).

Is it reasonable to want a full backup every so often? Since duplicity increments forward, each incremental backup is relying on the previous increment, and all are relying heavily on the last full backup. Should that become corrupt, bad things happen. A related question: Does Duplicity test the incremental backups for consistency?

Assuming I do want a full backup every so often, how efficiently does duplicity create that full backup? Can/does it check file signatures and copy unchanged data from previous full backups/increments? Basically creating a new 'full' archive transferring new/changed data and merging existing unchanged data?

Right now my concern is that running a full backup is needed, but the consistent large bandwidth use of full backups will make this unreasonable for some clients.

Best Answer

I think it's reasonable to want a full backup every so often: most of my machines are configured to do one every few months. There's nothing magic about that number: the right value is going to depend on how much data you have, how fast it changes, how likely you are to want to restore from anything other than the most recent snapshot, how much traffic and storage costs you, and how paranoid you are. Other people might want a full backup every week.

Unless you do a full backup from time to time the archive size and recovery time will continue to grow.

I don't think duplicity specifically has a "check" command http://pad.lv/660895, but it would be nice if it did. It is very prudent to do a test restore every so often.

A related question is whether you should keep more than one backup chain. Again, it depends on the cost. One reason to keep one is that you could restore from it if the current chain is corrupt, either because of hardware failure, OS failure, or a duplicity bug. Of course if the old chain is very old, restoring from it may be of limited value.

Making a full backup always uploads a full copy of the data.

If the client concern is the fraction of bandwidth used, rather than traffic charges, you might want to run it under eg trickle.

Related Topic