I would recommend tar. When the file trees are already similar, rsync performs very well. However, since rsync will do multiple analysis passes on each file, and then copy the changes, it is much slower than tar for the initial copy. This command will likely do what you want. It will copy the files between the machines, as well as preserve both permissions and user/group ownerships.
tar -c /path/to/dir | ssh remote_server 'tar -xvf - -C /absolute/path/to/remotedir'
As per Mackintosh's comment below this is the command you would use for rsync
rsync -avW -e ssh /path/to/dir/ remote_server:/path/to/remotedir
If you can trust the filesystem last-modified timestamps, you can speed things up by combining Rsync with the UNIX/Linux 'find' utility. 'find' can assemble a list of all files that show last-modified times within the past day, and then pipe ONLY that shortened list of files/directories to Rsync. This is much faster than having Rsync compare the metadata of every single file on the sender against the remote server.
In short, the following command will execute Rsync ONLY on the list of files and directories that have changed in the last 24 hours: (Rsync will NOT bother to check any other files/directories.)
find /local/data/path/ -mindepth 1 -ctime -0 -print0 | xargs -0 -n 1 -I {} -- rsync -a {} remote.host:/remote/data/path/.
In case you're not familiar with the 'find' command, it recurses through a specific directory subtree, looking for files and/or directories that meet whatever criteria you specify. For example, this command:
find . -name '\.svn' -type d -ctime -0 -print
will start in the current directory (".") and recurse through all sub-directories, looking for:
- any directories ("-type d"),
- named ".svn" ("-name '.svn'"),
- with metadata modified in the last 24 hours ("-ctime -0").
It prints the full path name ("-print") of anything matching those criteria on the standard output. The options '-name ', '-type ', and '-ctime ' are called "tests", and the option '-print' is called an "action". The man page for 'find' has a complete list of tests and actions.
If you want to be really clever, you can use the 'find' command's '-cnewer ' test, instead of '-ctime ' to make this process more fault-tolerant and flexible. '-cnewer' tests whether each file/directory in the tree has had its metadata modified more recently than some reference file. Use 'touch' to create the NEXT run's reference file at the beginning of each run, right before the 'find... | rsync...' command executes. Here's the basic implementation:
#!/bin/sh
curr_ref_file=`ls /var/run/last_rsync_run.*`
next_ref_file="/var/run/last_rsync_run.$RANDOM"
touch $next_ref_file
find /local/data/path/ -mindepth 1 -cnewer $curr_ref_file -print0 | xargs -0 -n 1 -I {} -- rsync -a {} remote.host:/remote/data/path/.
rm -f $curr_ref_file
This script automatically knows when it was last run, and it only transfers files modified since the last run. While this is more complicated, it protects you against situations where you might have missed running the job for more than 24 hours, due to downtime or some other error.
Best Answer
robocopy.exe
has a switch called inter-packet gap, allowing you to insert a time window in between the packets of your copy, and thereby reduce the impact on the channel.It's not exactly "use no more than 30% of the available bandwidth", but you can acheive the same effect with a little math. You can always specify some number of milliseconds and let it run for a bit, then
CTRL+C
to interrupt, adjust your command as needed, then resume. I've done just this when I didn't want to overload the WAN during the business day with massive replications.robocopy has another switch
/z
allowing for "resumeable" transfers, so if the transfer is interrupted you can pick up where you left off, and don't need to shift the whole 40 GB again.There are some nice GUIs for robocopy which can assist with the syntax, but anyone with a Linux background will grok it easily. Grab the latest versions from a copy of Windows 2003 or later. Otherwise you'll find it as a Windows 2000 Resource Kit Tool download.
In the Wikipedia entry for robocopy, someone noted that the penalty for restartable copying (the
/z
switch) is 6x slower performance (see Known Flaws).