Rsync for backup considered dangerous

backuprsync

In our research group, we have the need to backup data acquired on the MRI scanner in a way that will preserve any scan ever acquired (even though data might be deleted from the scanner due to space or other reasons). We call this our vault.

To store to the vault, a separate machine nfs-mounts the data partition of the scanner and copies data to its own local backup harddisk:

rsync -au /nfsmount/data /pvbackup-vault >> $LOGFILE

My question is: Is this safe? Our data sometimes gets reprocessed, after having been processed once before. So I want the -u flag.

For the actual, raw data (that is sacred) I can foresee one problem: Files on the scanner get overwritten due to some error/mistake/unforeseeable circumstance and then data in the vault will be overwritten. I am not sure how to protect against that. On one side I would like to allow for data to be re-processed maybe even re-acquired on the other side I would like to create a vault that is immune to future changes, at least on the data side.
Should I flag those circumstances and deal with them by hand? Tedious.

Note:
I have a different incremental strategy (rsnapshot) in place to protect from user error that allows recovery of inadvertently deleted/changed data reaching back a certain number of hours/days/weeks/months.

Note2:
Maybe I should mention that we are dealing with ca 250GB currently and ca 10GB per week newly acquired data. So, DVDs are out as alternative…

Best Answer

You are doing the first part right, get the data off of the hardware scanner in the event something happens to it. The second part should be that you take backups of your intermediary backup location. So in other words you either should setup a secondary rsync job to another final resting/backup place or actually have a backup program come in and take a periodic copy for a more permanent/archival purpose.

Often you will do two things to preserve data...

  1. Take the immediate disk to disk backup (Scanner using rsync)
  2. Take an archival backup disk to tape, or some newer methods just another disk to disk or disk to web mechanism.

This ensures your scanner data is protected and you have long term archives of everything that happens. Treat your disk to disk as temporary backup until the archival run can complete, your archive is sacred.