My solution to this was a different two-pass approach, where I trade off some disk space. I do rsync --only-write-batch on the server, then rsync the batch file itself to the destination, looping until the rsync succeeds. Once the batch is fully over rsync --read-batch on the destination recreates all the changes.
There are some unintended benefits to this for me as well:
because I'm more concerned that the backup "exists" than is "usable" I don't actually do the read-batch on the receiving end every day -- most of the time the batch is relatively small
I've been experimenting with --checksum-seed=1 ... I might be mis-reading the documentation but I think it makes the batch files more syncable (ie. when I don't do the --read-batch any given day, the next day's batch syncs faster because the previous day's batch is a good basis)
if the batch gets too big to send "in time" over the internet, I can sneaker-net it over on an external drive. By in-time I mean that if I can't get the batch over and read before the next day's backup starts.
although I don't personally do this, I could have two offsite backups in separate locations and send the batch to both of them.
Normally, rsync
skips files when the files have identical sizes and times on the source and destination sides. This is a heuristic which is usually a good idea, as it prevents rsync
from having to examine the contents of files that are very likely identical on the source and destination sides.
--ignore-times
tells rsync
to turn off the file-times-and-sizes heuristic, and thus unconditionally transfer ALL files from source to destination. rsync
will then proceed to read every file on the source side, since it will need to either use its delta-transfer algorithm, or simply send every file in its entirety, depending on whether the --whole-file
option was specified.
--checksum
also modifies the file-times-and-sizes heuristic, but here it ignores times and examines only sizes. Files on the source and destination sides that differ in size are transferred, since they are obviously different. Files with the same size are checksummed (with MD5 in rsync
version 3.0.0+, or with MD4 in earlier versions), and those found to have differing sums are also transferred.
In cases where the source and destination sides are mostly the same, --checksum
will result in most files being checksummed on both sides. This could take long time, but the upshot is that the barest minimum of data will actually be transferred over the wire, especially if the delta-transfer algorithm is used. Of course, this is only a win if you have very slow networks, and/or very fast CPU.
--ignore-times
, on the other hand, will send more data over the network, and it will cause all source files to be read, but at least it will not impose the additional burden of computing many cryptographically-strong hashsums on the source and destination CPUs. I would expect this option to perform better than --checksum
when your networks are fast, and/or your CPU relatively slow.
I think I would only ever use --checksum
or --ignore-times
if I were transferring files to a destination where it was suspected that the contents of some files were corrupted, but whose modification times were not changed. I can't really think of any other good reason to use either option, although there are probably other use-cases.
Best Answer
You need to change your find to look at files.