I'm trying to figure out why tcp/ip throughput between two hosts I control is much lower than expected.
Host A: internet connectivity provided by Comcast (Residential) in Boston Metro — can consistently achieve 100Mbps download from various well known sites (e.g. downloading Google Chrome)
Host B: internet connectivity provided by Cogent; can consistently achieve near 100Mbps to and from various sites.
Test Connection: Host A downloads a large file from B, via rsync/ssh on a non-standard port (22442) (i.e. rsync --progress -e 'ssh -p 22442' ...
— throughput is only around 2.5Mbps (292564 Bps)
Other clients (e.g a scratch EC2 instance) can achieve near 30 times throughput when attempting the same download from B
I've managed to capture statistics of the connection between A and B with tcptrace (attached below) but I'm not sure if anything seems wrong.
I imagine there is something on my end that needs tweaking, but I also have a slight suspicion about Comcast shaping traffic — so before I start flailing around, I thought I would seek advice. What are some good next steps for me to try? Thanks in advance.
tcptrace -r -l -o1 dump.out
1 arg remaining, starting with 'dump.out'
Ostermann's tcptrace -- version 6.6.7 -- Thu Nov 4, 2004
5763 packets seen, 5763 TCP packets traced
elapsed wallclock time: 0:00:00.019828, 290649 pkts/sec analyzed
trace file elapsed time: 0:00:18.655110
TCP connection info:
1 TCP connection traced:
TCP connection 1:
host a: XXX.XXXX.XXX:41608
host b: XXX.XXXX.XXX:22442
complete conn: RESET (SYNs: 2) (FINs: 1)
first packet: Sun May 26 14:48:52.923281 2013
last packet: Sun May 26 14:49:11.578391 2013
elapsed time: 0:00:18.655110
total packets: 5763
filename: dump.out
a->b: b->a:
total packets: 1946 total packets: 3817
resets sent: 15 resets sent: 0
ack pkts sent: 1930 ack pkts sent: 3817
pure acks sent: 1874 pure acks sent: 35
sack pkts sent: 228 sack pkts sent: 0
dsack pkts sent: 0 dsack pkts sent: 0
max sack blks/ack: 3 max sack blks/ack: 0
unique bytes sent: 4100 unique bytes sent: 5457821
actual data pkts: 55 actual data pkts: 3781
actual data bytes: 4100 actual data bytes: 5457821
rexmt data pkts: 0 rexmt data pkts: 0
rexmt data bytes: 0 rexmt data bytes: 0
zwnd probe pkts: 0 zwnd probe pkts: 0
zwnd probe bytes: 0 zwnd probe bytes: 0
outoforder pkts: 0 outoforder pkts: 23
pushed data pkts: 55 pushed data pkts: 41
SYN/FIN pkts sent: 1/1 SYN/FIN pkts sent: 1/0
req 1323 ws/ts: Y/Y req 1323 ws/ts: Y/Y
adv wind scale: 7 adv wind scale: 7
req sack: Y req sack: Y
sacks sent: 228 sacks sent: 0
urgent data pkts: 0 pkts urgent data pkts: 0 pkts
urgent data bytes: 0 bytes urgent data bytes: 0 bytes
mss requested: 1460 bytes mss requested: 1460 bytes
max segm size: 712 bytes max segm size: 1448 bytes
min segm size: 16 bytes min segm size: 21 bytes
avg segm size: 74 bytes avg segm size: 1443 bytes
max win adv: 301184 bytes max win adv: 21632 bytes
min win adv: 5888 bytes min win adv: 14592 bytes
zero win adv: 0 times zero win adv: 0 times
avg win adv: 269168 bytes avg win adv: 21615 bytes
initial window: 20 bytes initial window: 21 bytes
initial window: 1 pkts initial window: 1 pkts
ttl stream length: 4101 bytes ttl stream length: NA
missed data: 1 bytes missed data: NA
truncated data: 0 bytes truncated data: 0 bytes
truncated packets: 0 pkts truncated packets: 0 pkts
data xmit time: 18.128 secs data xmit time: 18.527 secs
idletime max: 5206.5 ms idletime max: 5285.9 ms
throughput: 220 Bps throughput: 292564 Bps
RTT samples: 57 RTT samples: 1661
RTT min: 59.0 ms RTT min: 0.0 ms
RTT max: 106.1 ms RTT max: 40.1 ms
RTT avg: 77.2 ms RTT avg: 1.4 ms
RTT stdev: 17.2 ms RTT stdev: 7.1 ms
RTT from 3WHS: 60.0 ms RTT from 3WHS: 0.0 ms
RTT full_sz smpls: 2 RTT full_sz smpls: 1
RTT full_sz min: 60.0 ms RTT full_sz min: 0.0 ms
RTT full_sz max: 70.1 ms RTT full_sz max: 0.0 ms
RTT full_sz avg: 65.1 ms RTT full_sz avg: 0.0 ms
RTT full_sz stdev: 0.0 ms RTT full_sz stdev: 0.0 ms
post-loss acks: 0 post-loss acks: 16
segs cum acked: 0 segs cum acked: 2090
duplicate acks: 0 duplicate acks: 201
triple dupacks: 0 triple dupacks: 8
max # retrans: 0 max # retrans: 0
min retr time: 0.0 ms min retr time: 0.0 ms
max retr time: 0.0 ms max retr time: 0.0 ms
avg retr time: 0.0 ms avg retr time: 0.0 ms
sdv retr time: 0.0 ms sdv retr time: 0.0 ms
A few updates
I tried a few other things.
-
First I tried "rerouting" the flow by tunneling the transfer through a third host — I achieved very respectable thoughput for about 30 minutes and then the throughput decayed by a factor of 20-ish to more or less match the original flow.
-
Second I ran mtr from B->A with packets of size 1400b, and saw the following:
X.X.X (0.0.0.0) Mon May 27 19:20:37 2013 Keys: Help Display mode Restart statistics Order of fields quit Packets Pings Host Loss% Snt Last Avg Best Wrst StDev 1. X-X-X.X.X 0.0% 173 0.1 0.1 0.1 0.2 0.0 2. X-X.X.X-X.X.atlas.cogentco.com 0.0% 173 0.9 0.9 0.8 4.2 0.3 3. X-X.X.X-X.X.atlas.cogentco.com 0.0% 173 0.7 12.7 0.7 211.8 38.3 4. te7-3.ccr01.ord07.atlas.cogentco.com 0.0% 173 1.1 12.8 0.7 196.8 39.9 5. te0-4-0-7.mpd21.ord01.atlas.cogentco.com 0.0% 173 1.0 1.0 0.9 1.5 0.1 6. be2006.ccr21.ord03.atlas.cogentco.com 0.0% 173 1.3 1.3 1.2 1.9 0.1 7. be-202-pe04.350ecermak.il.ibone.comcast.net 0.0% 172 39.7 41.6 36.3 156.7 15.4 8. he-3-2-0-0-cr01.350ecermak.il.ibone.comcast.net 2.3% 172 46.9 45.0 36.8 51.4 3.6 he-3-1-0-0-cr01.350ecermak.il.ibone.comcast.net he-3-0-0-0-cr01.350ecermak.il.ibone.comcast.net he-3-12-0-0-cr01.350ecermak.il.ibone.comcast.net 9. he-3-6-0-0-ar01.woburn.ma.boston.comcast.net 1.2% 172 68.7 67.9 64.8 72.4 1.6 10. X.X.X.X.boston.comcast.net 1.7% 172 67.6 66.8 64.2 70.4 1.0 11. X.X.X.X.boston.comcast.net 1.2% 172 67.5 66.4 64.0 69.2 1.0 12. X.X.X.X.comcast.net 1.2% 172 75.7 75.2 72.0 85.8 1.9
Given the results of my first experiment, and the fact that so much packet loss is occurring between two routers controlled by Comcast in a Tier III facility (350 Cermak) — I'm starting to lean more towards the "Comcast Boogeyman" explanation.
Best Answer
I'd suggest a few things:
Check for packet loss with ping. Try at least 100 pings with different packet sizes, including a larger packet around 1400 bytes.
Round trip time (RTT) of ~70 ms could be quite high depending on distance. Maybe sure to look at a traceroute going each way. In the output you pasted, it appears that your traffic from a -> b is suffering from latency but not the other way around, which could suggest that there is a problem with the path from a -> b, but that the return path from b -> a is fine.
Look at the techniques in this old paper for analysing TCP throughput.
The paper is a little out of date (Ethereal is now Wireshark), but the techniques are still sound.
5 seconds of idle time compared to 18 seconds of wall clock time could suggest the connection going idle when a TCP window buffer (either send or receive) is getting full, which would contribute to poor throughput. However, the very asymmetric latency is the first thing that stands out, so I would focus on that first. Good luck and thanks for introducing me to tcptrace; I hadn't seen that before.