Linux – Strange 3-second tcp connection latencies (Linux, HTTP)

apache-2.2linuxtcp

Our webservers with static content are experiencing strange 3 second latencies occasionally. Typically, an ApacheBench run (> 10000 requests, concurrency 1 or 40, no difference, but keepalive off) looks like this:

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2   10 152.8      3    3015
Processing:     2    8  34.7      3     663
Waiting:        2    8  34.7      3     663
Total:          4   19 157.2      6    3222

Percentage of the requests served within a certain time (ms)
  50%      6
  66%      7
  75%      7
  80%      7
  90%      9
  95%     11
  98%    223
  99%    225
 100%   3222 (longest request)

I have tried many things:
– Apache2 2.2.9 with worker or prefork MPM, no difference (with KeepAliveTimeout 10-15)
– Nginx 0.6.32
– various tcp parameters (net.core.somaxconn=3000, net.ipv4.tcp_sack=0, net.ipv4.tcp_dsack=0)
– putting the files/DocumentRoot on tmpfs
– shorewall on or off (i.e. empty iptables or not)
– AllowOverride None is on for /, so no .htaccess checks (verified with strace)
– the problem persists whether the webservers are accessed directly or through a Foundry load balancer

Kernel is 2.6.32 (Debian Lenny backports), but it occurred with 2.6.26 also. IPv6 is enabled, but not used.

Does the issue look familiar to anyone? Help/suggestions are much appreciated. It sounds a bit like a SYN,ACK packet getting lost or ignored.

Best Answer

Capture this event with tcpdump/Wireshark/tshark. Then open the capture in Wireshark, go to Statistics->TCP stream graph->Time-sequence graph (Stevens).

This gets you a graph of sequence numbers vs time. If you have a 3 second gap in your connections, you should be able to spot it, as there should be no dots for the 3 seconds on the x-axis in between two dense groupings of dots. Click on the last dot on the left side of the gap. This takes you to the frame just before the gap happens. Usually that's the one packet containing the problem. You might see zero-window packet, packet missing, out of order delivery, dups, etc...