Linux – Multi Threaded file sync between 2 Linux servers

linuxrsync

At the moment i'm running rsync for 2.2 million files total of 250GB and that just takes ages 700K files in 6 hours.

Does anyone know a rsync like tool that can do this with multiple threads so it goes faster?

Best Answer

I doubt cpu is the limiting factor here. You're most likely limited by both network bandwidth for the transfer, and disk IO; especially latency for all those stat calls.

Can you break down the filesystem hierarchy into smaller chunks to process in parallel?

What are the source files, and what's writing or modifying them? Would it be possible to send changes as they happen at the application level?

Related Solutions

Linux – Copying a large directory tree locally? cp or rsync

I would use rsync as it means that if it is interrupted for any reason, then you can restart it easily with very little cost. And being rsync, it can even restart part way through a large file. As others mention, it can exclude files easily. The simplest way to preserve most things is to use the -a flag – ‘archive.’ So:

rsync -a source dest

Although UID/GID and symlinks are preserved by -a (see -lpgo), your question implies you might want a full copy of the filesystem information; and -a doesn't include hard-links, extended attributes, or ACLs (on Linux) or the above nor resource forks (on OS X.) Thus, for a robust copy of a filesystem, you'll need to include those flags:

rsync -aHAX source dest # Linux
rsync -aHE source dest  # OS X

The default cp will start again, though the -u flag will "copy only when the SOURCE file is newer than the destination file or when the destination file is missing". And the -a (archive) flag will be recursive, not recopy files if you have to restart and preserve permissions. So:

cp -au source dest

Linux – Copy large file from one Linux server to another

Sneakernet Anyone?

Assuming this is a one time copy, I don't suppose its possible to just copy the file to a CD (or other media) and overnight it to the destination is there?

That might actually be your fastest option as a file transfer of that size, over that connection, might not copy correctly... in which case you get to start all over again.

rsync

My second choice/attempt would be rsync as it detects failed transfers, partial transfers, etc. and can pick up from where it left off.

rsync --progress file1 file2 user@remotemachine:/destination/directory

The --progress flag will give you some feedback instead of just sitting there and leaving you to second guess yourself. :-)

Vuze (bittorrent)

Third choice would probably be to try and use Vuze as a torrent server and then have your remote location use a standard bitorrent client to download it. I know of others who have done this but you know... by the time they got it all set up running, etc... I could have overnighted the data...

Depends on your situation I guess.

Good luck!

UPDATE:

You know, I got thinking about your problem a little more. Why does the file have to be a single huge tarball? Tar is perfectly capable of splitting large files into smaller ones (to span media for example) so why not split that huge tarball into more managable pieces and then transfer the pieces over instead?

Best Answer

Related Solutions

Linux – Copying a large directory tree locally? cp or rsync

Linux – Copy large file from one Linux server to another

Related Topic