I would use rsync as it means that if it is interrupted for any reason, then you can restart it easily with very little cost. And being rsync, it can even restart part way through a large file. As others mention, it can exclude files easily. The simplest way to preserve most things is to use the -a
flag – ‘archive.’ So:
rsync -a source dest
Although UID/GID and symlinks are preserved by -a
(see -lpgo
), your question implies you might want a full copy of the filesystem information; and -a
doesn't include hard-links, extended attributes, or ACLs (on Linux) or the above nor resource forks (on OS X.) Thus, for a robust copy of a filesystem, you'll need to include those flags:
rsync -aHAX source dest # Linux
rsync -aHE source dest # OS X
The default cp will start again, though the -u
flag will "copy only when the SOURCE file is newer than the destination file or when the destination file is missing". And the -a
(archive) flag will be recursive, not recopy files if you have to restart and preserve permissions. So:
cp -au source dest
Sneakernet Anyone?
Assuming this is a one time copy, I don't suppose its possible to just copy the file to a CD (or other media) and overnight it to the destination is there?
That might actually be your fastest option as a file transfer of that size, over that connection, might not copy correctly... in which case you get to start all over again.
rsync
My second choice/attempt would be rsync as it detects failed transfers, partial transfers, etc. and can pick up from where it left off.
rsync --progress file1 file2 user@remotemachine:/destination/directory
The --progress flag will give you some feedback instead of just sitting there and leaving you to second guess yourself. :-)
Vuze (bittorrent)
Third choice would probably be to try and use Vuze as a torrent server and then have your remote location use a standard bitorrent client to download it. I know of others who have done this but you know... by the time they got it all set up running, etc... I could have overnighted the data...
Depends on your situation I guess.
Good luck!
UPDATE:
You know, I got thinking about your problem a little more. Why does the file have to be a single huge tarball? Tar is perfectly capable of splitting large files into smaller ones (to span media for example) so why not split that huge tarball into more managable pieces and then transfer the pieces over instead?
Best Answer
I doubt cpu is the limiting factor here. You're most likely limited by both network bandwidth for the transfer, and disk IO; especially latency for all those stat calls.
Can you break down the filesystem hierarchy into smaller chunks to process in parallel?
What are the source files, and what's writing or modifying them? Would it be possible to send changes as they happen at the application level?