Rsync with –hard-links freezes

hardlinkrsnapshotrsync

I have a large directory called servers, which contains many hard-links made by rsnapshot. That means that the structure is more or less like:

./servers
./servers/daily.0
./servers/daily.0/file1
./servers/daily.0/file2
./servers/daily.0/file3
./servers/daily.1
./servers/daily.1/file1
./servers/daily.1/file2
./servers/daily.1/file3
...

The snapshots were created with rsnapshot in a space-saving way: if /servers/daily.0/file1 is the same as /servers/daily.1/file1, they both point to the same inode using hard-link, instead of just copying a complete snapshot every cycle./servers/daily.0/file1/servers/daily.0/file1

I've tried to copy it with the hard links structure, in order to save space on the destination drive, using:

nohup time rsync -avr --remove-source-files --hard-links servers /old_backups

After some time, the rsync freezes – no new lines are added to the nohup.out file, and no files seem to move from one drive to another. Removing the nohup didn't solve the problem.

Any idea what's wrong?

Adam

Best Answer

My answer, which I give from hard-earned experience, is: Don't do this. Don't try to copy a directory hierarchy that makes heavy use of hard links, such as one created using rsnapshot or rsync --link-dest or similar. It won't work on anything but small datasets. At least, not reliably. (Your mileage may vary, of course; perhaps your backup datasets are much smaller than mine were.)

The problem with using rsync --hard-links to recreate the hard-linked structure of files on the destination side is that discovering the hard-links on the source side is hard. rsync has to build a map of inodes in memory to find the hard-links, and unless your source has relatively few files, this can and will blow up. In my case, when I learned of this problem and was looking around for alternate solutions, I tried cp -a, which is also supposed to preserve the hard-link structure of files in the destination. It churned away for a long time and then finally died (with a segfault, or something like that).

My recommendation is to set aside an entire partition for your rsnapshot backup. When it fills up, bring another partition online. It is much easier to move around hard-link-heavy datasets as entire partitions, rather than as individual files.

Related Topic