Rsync vs SCP – Efficiently Copying Huge Files Between Remote Machines


I have a shell script which keeps on copying huge files (2 GB to 5 GB) between remote systems.
Key based authentication is used with agent-forwarding and everything works.
For ex: Say the shell script is running on machine-A and copying files from machine-B to machine-C.

"scp -Cp -i private-key ssh_user@source-IP:source-path ssh_user@destination-IP:destination-path"

Now the problem is the process sshd is continuously taking loads of CPU.
For ex: top -c on destination machine (i.e. machine-C) shows

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                         
14580 ssh_user  20   0 99336 3064  772 R 85.8  0.0   0:05.39 sshd: ssh_user@notty                                                            
14581 ssh_user  20   0 55164 1984 1460 S  6.0  0.0   0:00.51 scp -p -d -t /home/binary/instances/instance-1/user-2993/

This results in high load average.

I believe scp is taking so much CPU because its encrypting/decrypting data. But I don't need encrypted data-transfer as both machine-B and machine-C are in a LAN.

What other options do I have? I considered 'rsync'. But the rsync man page says:

       Rsync  copies files either to or from a remote host, or locally on the current host (it does not support copying files between two
       remote hosts).

Edit 1: I am already using ssh cipher = arcfour128. Little improvement but that doesn't solve my problem.

Edit 2: There are other binaries (my main application) running on the machines and high load average causing them to perform poorly.

Best Answer

This problem can be solved with rsync. At least this solution should be competitive in terms of performance.

First, rsync can be called from one of the remote systems to overcome the limitation in the inability to copy between two remote systems directly.

Second, encryption/decryption can be avoided by running rsync in Daemon Access mode instead of Remote Shell Access mode.

In daemon access mode rsync does not tunnel the traffic through an ssh connection. Instead it uses its own protocol on top of TCP.

Normally you run rsync daemon from inet.d or stand-alone. Anyway this requires root access to one of the remote systems. Assuming root access is not available, it is still possible to start up the daemon.

Start rsync daemon as a non-privileged user on the destination machine

ssh -i private_key ssh_user@destination-IP \
       "echo -e 'pid file = /tmp/\nport = 1873' > /tmp/rsyncd.conf

ssh -i private_key ssh_user@destination-IP \
       rsync --config=/tmp/rsyncd.conf --daemon

Actually copy the files

ssh -i private_key ssh_user@source_ip \
       "rsync [OPTIONS] source-path \