Ssh – Why is scp with compression slower than without

file-transferscpssh

I needed to transfer a 20 GB KVM vdisk file, storing the root filesystem of a CentOS 6.5 VM, from one lab server to another. The large file size and the fact that I had once compressed such a vdisk file to a few hundred mega-bytes made me instinctively enable compression with scp but I was surprised to see a rather low transfer speed. Then I tried bzip2 in combination with ssh and cat and was startled. Here is the summary of methods and average throughput.

  • scp -C vm1-root.img root@192.168.161.62:/mnt/vdisks/, 11 MB/s.
  • bzip2 -c vm1-root.img | ssh -l root 192.168.161.62 "bzip2 -d -c > /mnt/vdisks/vm1-root.img", 5 MB/s. This even lower result prompted searching on the Net.
  • scp -c arcfour -C vm1-root.img root@192.168.161.62:/mnt/vdisks/, 13 MB/s. This use of -c arcfour as was suggested in one answer on serverfault. It hardly helped. Finally, I disabled compression.
  • scp vm1-root.img root@192.168.161.62:/mnt/vdisks/, 23 MB/s.

Shouldn't compression have been faster?


After receiving the ssh(1) man page tip from @sven, I tried a couple alternative methods of file transfer not involving compression, both with better results.

  • cat vm1-root.img | ssh -l root 192.168.161.62 "cat > /mnt/vdisks/vm1-root.img", 26 MB/s.

  • nc -l 5678 > /mnt/vdisks/vm1-root.img on the receiver and nc 192.168.161.62 5678 < vm1-root.img on the transmitter, 40 MB/s. The port 5678 is an arbitrary one that was available.

Using nc turned out to be the fastest copying method!

In the past, scp -C has worked very well whenever I thought it would. For example, when transferring syslogs (/var/log/messages*) of few GBs in size. An uncompressed transfer rate of few hundred KB/s would increase to 1-2 MB/s. This example does fall in the case of a slow connection as has been pointed out in the man page.

I have a case where, a newly created vdisk image for a 20 GB partition has a compressed size of just 200 MB. With a transfer rate of about 25 MB/s, we could do the copying in just 8 seconds instead of over 13 minutes! Clearly, scp without compression is inefficient in this case and scp -C is even worse.

I guess, the main lesson learned here is that, scp -C should be thought of as only being a convenience. If a file can be significantly compressed, then it is better to first compress it on the source, transfer the compressed form and finally decompress on the destination. Tools that do the compression and decompression quickly (e.g. pbzip2) will be of greater help.

Best Answer

Quoting man ssh (which is the base used by scp):

Compression is desirable on modem lines and other slow connections, but will only slow down things on fast networks.

The problem is that compressing the data takes more time then just sending it over the network.