Linux – How to mirror filesystems with millions of hardlinks

hardlinklinuxperformancersync

We have one big problem at the moment: We need to mirror a filesystem for one of our customers. Thats usual not really a problem, but here it is:

On this filesystem there is one folder with millions of hardlinks (yes! MILLIONS!). rsync requires more then 4 days to just build the filelist.

We use the following rsync options:

rsync -Havz --progress serverA:/data/cms /data/

Has anyone a idea how to speed up this rsync, or use alternatives? We could not use dd as the target disk is smaller then the source.

UPDATE:
As the original filesystem is ext3 we will try dump and restore. I will keep you up2date

Best Answer

You need to upgrade both sides to rsync 3. From the changelog:

- A new incremental-recursion algorithm is now used when rsync is talking
  to another 3.x version.  This starts the transfer going more quickly
  (before all the files have been found), and requires much less memory.
  See the --recursive option in the manpage for some restrictions.

It has been over 2 years since rsync 3.0.0 was released, but, unfortunately, most enterprise distributions are based off code older than that, which means you're probably using rsync 2.6.

For reference (if anyone else is having this problem), if you are running rsync 3 already, then you are using options that are incompatible with incremental recursion. From the man page:

    Some options require rsync to know the full file list, so  these
    options  disable the incremental recursion mode.  These include:
    --delete-before,   --delete-after,    --prune-empty-dirs,    and
    --delay-updates.

Also, again, both sides must be running rsync 3 for incremental recursion to be supported.

Related Topic