TCP retransmission delays – lost acks

latencylinux-networkingperformance-tuningtcp

Maybe somebody will be able to help me out with this. I'm trying to find out if there is anything that can be optimized server-side to reduce delays in case of packet loss.

Environment: Windows 2012 client, CentOS 6.x server [Couchbase], same datacenter, busy LAN with firewalls to traverse. Both are large physical servers with plenty of spare capacity.

Issue: as measured from the client, response times are nicely distributed around ~1ms, but we see a spike at ~200ms.

A network trace shows this:

  1. Client -> send request
  2. Server -> replies (1 ms) with a packet with {application response + TCP ack to request packet} (78 bytes in this case)
  3. The packet is NOT received by the client
  4. after ~30 ms, the client TCP stack retransmits the original request
  5. The server replies immediately with a DUP ACK (66 bytes, does not contain the application response)
  6. After ~200 ms from the initial request, the server retransmits the original
    response (78 bytes packet).

Any idea where does this 200ms delay come from, and how to reduce it? I'd guess some combination of tcp delayed acks, nagle and congestion/RTO algorithms, but linux kernel tuning is a bit of a mystery to me.

Any suggestion?

Best Answer

yes, wireshark both sides, tcpdump, network traces taken at the switch level (rather high-end Arista 10G switches), traces taken on the firewall (Fortinet), etc. etc.

The problem is not why the client is not receiving the reply. This is a busy network with bursty traffic, so losing one packet in 10,000 is not unexpected. But I need to provide an SLA even when I lose a packet, and this 200 ms of delay is throwing it off.

I mean, experimenting on DEV I can 'fix' the problem by setting the TCP RTO for the client subnet to 5ms via a route command [server-side]. With this, 99.999% of my requests gets answered in under 10ms, and I would meet my SLA. Fine, but what are the drawbacks of doing this in production? Is the RTO the real issue, or am I fixing it by accident? Is this the best possible fix for the issue, or is there something smarter/better (tuned profile? sysctl parameter? prayer to the minix gods?)?

ri-thanks

Related Topic