Linux – Network Packet Out-of-Order/Duplicate ACKs

linuxlinux-networking

I'm currently having a very large problem with network performance from a new DataCenter that we just moved into, and honestly I'm at a loss where to proceed; so I'm looking for some inspiration.

The DataCenter has a managed network to which I do not have access, but we are in charge of managing our hosts within it.

General Information

  • We have eleven hosts (all Debian Squeeze) in the DataCenter environment (Dell R210s and R710s).
  • Each host has two active interfaces, which are setup in a bond0 active/passive setup (eth0 and eth1).
  • The networking stack on the hosts is largely as per the default Debian setup, we aren't running with attempted performance optimizations or similar.
  • The problem is identical/replicable on any of the eleven hosts, however only applies to traffic that crosses the network boundary (i.e. it does not apply to traffic between the internal hosts themselves).
  • The DataCenter support team has hooked a laptop up to the same switch that the rest of the hosts are on and is unable to recreate the issue; and thus have said that it must be a configuration issue with the internal hosts themselves, and is NOT a problem with the network.

The Problem

On outbound transfers, the transfer starts quickly with a large window size, but on the remote server packets are received out-of-order and this in turn causes duplicate ACKs to be sent out. In short order the window size has shrunk massively (it stabilizes between 40,000 and 60,000 bytes) and the transfer has gone from megabytes per second down to ~200-300KB/sec.

On inbound transfers, everything is "fine" (where "fine" is defined as 2MB/sec sustained transfer rates).

So, an SCP transfer of a 20MB file OUT of the datacenter will start at ~2.2MB/sec but drop off to ~275KB/sec and will take 01m14s in total, while the SCP transfer of the same 20MB file INTO the datacenter will start at ~2.2MB/sec, remain stable between ~2.0-2.2MB/sec and finish in 00m09s.

What I Have Tried

  • I've verified that there is no negotiation confusion between the hosts and the network hardware — all links are seen as 1GbE full duplex by all parties.
  • I've tried disabling window scaling.
  • I've tried shrinking net.ipv4.tcp_rmem and net.ipv4.tcp_wmem from their debian defaults.
  • I've tried disabling bond0 and just transferring the files over a plain-jane eth0 interface.
  • I've tried transferring to multiple far-flung external end-points; all share the same problem (i.e. I am sure that the problem is on the DataCenter end, and not on the other end).
  • I've run mtr checks of the routes to the multiple external end-points (all of which I can replicate the problem on) — the routes are disparate (not at all similar after a few hops), and while some of them show some level of packet loss; the fact that the behaviour is so similar across all of the endpoints (which have dissimilar routes and dissimilar levels of packet loss) leads me to believe that the problem isn't the fault of anything more than three or four hops from the internal DC (as those are the common hops for each route — and those hops don't show any significant levels of packet loss).

Below is a traffic analysis of inbound/outbound traffic (from the perspective of the host in the DC). As you can see there are (very) regular duplicate ACKs that keep the transfer speed far below what it should be. Also note that on an inbound transfer, the same problem does not occur.

tshark -r outbound-bond0.pcap -q -z io,stat,1,\
  "COUNT(tcp.analysis.retransmission) tcp.analysis.retransmission",\
  "COUNT(tcp.analysis.duplicate_ack)tcp.analysis.duplicate_ack",\
  "COUNT(tcp.analysis.lost_segment) tcp.analysis.lost_segment",\
  "COUNT(tcp.analysis.fast_retransmission) tcp.analysis.fast_retransmission"
===================================================================
IO Statistics
Interval: 1.000 secs
Column #0: COUNT(tcp.analysis.retransmission) tcp.analysis.retransmission
Column #1: COUNT(tcp.analysis.duplicate_ack)tcp.analysis.duplicate_ack
Column #2: COUNT(tcp.analysis.lost_segment) tcp.analysis.lost_segment
Column #3: COUNT(tcp.analysis.fast_retransmission) tcp.analysis.fast_retransmission
                |   Column #0    |   Column #1    |   Column #2    |   Column #3    
Time            |          COUNT |          COUNT |          COUNT |          COUNT 
000.000-001.000                 8               22                0                2 
001.000-002.000                 4               28                0                3 
002.000-003.000                 4               33                0                4 
003.000-004.000                 4               25                0                3 
004.000-005.000                 3               28                0                3 
005.000-006.000                 4               38                0                4 
006.000-007.000                 6               22                0                4 
007.000-008.000                 4               14                0                2 
008.000-009.000                 5               33                0                4 
009.000-010.000                 1               10                0                1 
010.000-011.000                 4               25                0                2 
011.000-012.000                 2               25                0                2 
012.000-013.000                 3               35                0                3 
013.000-014.000                 2               23                0                2 
014.000-015.000                 4               50                0                4 
015.000-016.000                 3               22                0                2 
016.000-017.000                 5               28                0                3 
017.000-018.000                 3               29                0                3 
018.000-019.000                 3               31                0                3 
019.000-020.000                 5               17                0                2 
020.000-021.000                 4               40                0                4 
021.000-022.000                 7               27                0                3 
022.000-023.000                 5               37                0                4 
023.000-024.000                10               17                0                1 
024.000-025.000                 3               10                0                1 
025.000-026.000                 4                9                0                2 
026.000-027.000                 3               10                0                1 
027.000-028.000                 4               47                0                4 
028.000-029.000                 5               35                0                4 
029.000-030.000                 3               14                0                2 
030.000-031.000                 9               24                0                3 
031.000-032.000                 4               20                0                3 
032.000-033.000                 6               37                0                5 
033.000-034.000                 3               19                0                3 
034.000-035.000                 3               17                0                1 
035.000-036.000                 3               42                0                3 
036.000-037.000                 6               49                0                5 
037.000-038.000                 1                7                0                1 
038.000-039.000                 9               59                0                6 
039.000-040.000                 3               23                0                3 
040.000-041.000                 1               12                0                1 
041.000-042.000                 4               39                0                2 
042.000-043.000                 6               15                0                0 
043.000-044.000                 2               25                0                2 
044.000-045.000                 3               41                0                3 
045.000-046.000                 1                8                0                1 
===================================================================

tshark -r inbound-bond0.pcap -q -z io,stat,1,\
  "COUNT(tcp.analysis.retransmission) tcp.analysis.retransmission",\
  "COUNT(tcp.analysis.duplicate_ack)tcp.analysis.duplicate_ack",\
  "COUNT(tcp.analysis.lost_segment) tcp.analysis.lost_segment",\
  "COUNT(tcp.analysis.fast_retransmission) tcp.analysis.fast_retransmission"
===================================================================
IO Statistics
Interval: 1.000 secs
Column #0: COUNT(tcp.analysis.retransmission) tcp.analysis.retransmission
Column #1: COUNT(tcp.analysis.duplicate_ack)tcp.analysis.duplicate_ack
Column #2: COUNT(tcp.analysis.lost_segment) tcp.analysis.lost_segment
Column #3: COUNT(tcp.analysis.fast_retransmission) tcp.analysis.fast_retransmission
                |   Column #0    |   Column #1    |   Column #2    |   Column #3    
Time            |          COUNT |          COUNT |          COUNT |          COUNT 
000.000-001.000                 0                0                0                0 
001.000-002.000                 0                0                0                0 
002.000-003.000                 0                0                0                0 
003.000-004.000                 0                0                0                0 
004.000-005.000                 0                0                0                0 
005.000-006.000                 0                0                0                0 
006.000-007.000                 0                0                0                0 
007.000-008.000                 1               26                1                0 
008.000-009.000                 1               70                0                1 
009.000-010.000                21              184                5                4 
010.000-011.000                 4               42                4                2 
011.000-012.000                 9               48                3                2 
012.000-013.000                 0                0                0                0 
013.000-014.000                 0                0                0                0 
014.000-015.000                 1               29                1                1 
===================================================================

Frankly, I'm at an utter loss. Suggestions for what to try next are very welcome.

Best Answer

if you're sure that the problem is caused by out-of-order packets, then i can easily think of one thing that would cause your packets to go out of order: a multi-link etherchannel somewhere between you and the edge of the DC configured for per-packet round-robin load balancing. ask your provider to look for that specifically.

Related Topic