How to find the bottleneck while transferring huge files between 2 hosts

bandwidthbottleneck

We frequently need to transfer huge files (upwards of 50 GB) between two hosts, and the transfer rate never seems to reach the expected throughput for the network. There are several points which could be the bottleneck, but each of their theorical upper limit are way over the actual transfer rate. Here's a typical setup :

Laptop –> 802.11n –> AP –> CAT 6 cable –> 10/100 Mbits router –> Desktop

In this connection, the bottleneck is clearly the router, which would limit the transfer rate at 100 Mbits/sec. Even then, I rarely see a transfer rate (with scp) exceeding 9.5 MB/s, which represents 76 Mbits/sec, or only 76% of theorical maximum limit.

Can there really be a 24% overhead at the access point, or is there something else limiting the speed? It could be disk I/O (although SATA is rated at 1.5 Gbps), or anything on the motherboard between the disk and the NIC (how can I measure that?).

Is there a way to know for sure(*) where the bottleneck is? If I can't get more than 76 Mbps from a 100 Mbps router, will upgrading the network to gigabit increase throughput or will I still get 76 Mbps because the bottleneck is elsewhere?

(*) or at least in a way convincing enough that a boss would agree to invest to upgrade that one part of the network

Best Answer

your problem is that you are testing too many things at once:

  • disk read speed
  • SSH encryption
  • wireless
  • SSH decryption
  • disk write speed

Since you mentioned SSH I am going to assume this is a unix system...

You can rule out any problems with disk read speed with a simple

dd if=yourfile of=/dev/null #or
pv yourfile > /dev/null

on the receiving end you can do a simple disk write test

dd if=/dev/zero of=testfile bs=1M count=2000 # or
dd if=/dev/zero bs=1M count=2000 | pv > testfile

dd is not really a "benchmark" but since scp uses sequential IO, it is close enough

you can also test SSH by doing something like

dd if=/dev/zero bs=1M count=100 | ssh server dd of=/dev/null # or
dd if=/dev/zero bs=1M count=100 | pv | ssh server dd of=/dev/null

finally, to rule out SSH being the bottleneck, you can use nc to test the network performance

server$ nc -l 1234 > /dev/null
client$ dd if=/dev/zero bs=1M count=100 | pv | nc server 1234 # or
client$ dd if=/dev/zero bs=1M count=100 | nc server 1234

if you really want to properly test the network, install and use something like iperf, but nc is a good start.

I'd start with the nc test as that will rule out the most things. You should also definitely run the tests while not using wireless. 802.11n can easily max out a 100mbit port, but only if you have it properly setup.

(Ubuntu >= 12.04 defaults to netcat-openbsd. nc -l -p 1234 > /dev/null may be what you want if you're using netcat-traditional).

Related Topic