Linux server resend SYN ACK

kernellinuxsockettcpip

I am troubleshooting a connection timeout problem between two Linux boxes, where it seems the ACK to SYN-ACK was lost on the server's stack.

The tcpdump is done on the server side.

The client got syn-ack, sent ACK and data packet, and resent data 4 more times. The server resent syn-ack 4 seconds after it sent the syn-ack, indicating the ACK from the client is lost on the server's stack. The client responds with an ACK.

Then around 3s later the client resent data, and got the server's ACK. The client sent FIN at 10s because the client app has set a 10s timeout.

So the question is: the tcpdump shows the ACK to the SYN-ACK arrived at the server. Under which case could the server resend the SYN-ACK? Is it kernel or app issue on the server side? And how to debug further?

Appreciate your help.


20:31:01.159098 IP client.cport > server.sport: S 2848162415:2848162415(0) win 5840 
20:31:01.159103 IP server.sport > client.cport: S 901143055:901143055(0) ack 2848162416 win 5792 
20:31:01.159192 IP client.cport > server.sport: . ack 1 win 46 
20:31:01.159276 IP client.cport > server.sport: P 1:426(425) ack 1 win 46 
20:31:01.380395 IP client.cport > server.sport: P 1:426(425) ack 1 win 46 
20:31:01.824367 IP client.cport > server.sport: P 1:426(425) ack 1 win 46 
20:31:02.712362 IP client.cport > server.sport: P 1:426(425) ack 1 win 46 
20:31:04.488358 IP client.cport > server.sport: P 1:426(425) ack 1 win 46 
20:31:05.159038 IP server.sport > client.cport: S 901143055:901143055(0) ack 2848162416 win 5792 
20:31:05.159157 IP client.cport > server.sport: . ack 1 win 46 
20:31:08.040317 IP client.cport > server.sport: P 1:426(425) ack 1 win 46 
20:31:08.040326 IP server.sport > client.cport: . ack 426 win 27 
20:31:11.159618 IP client.cport > server.sport: F 426:426(0) ack 1 win 46 
20:31:11.199139 IP server.sport > client.cport: . ack 427 win 27 
20:31:14.724604 IP server.sport > client.cport: . 1:1449(1448) ack 427 win 27 
20:31:14.724612 IP server.sport > client.cport: P 1449:1756(307) ack 427 win 27 
20:31:14.724776 IP client.cport > server.sport: R 2848162842:2848162842(0) win 0
20:31:14.724779 IP client.cport > server.sport: R 2848162842:2848162842(0) win 0

Edit: There are tons of overflow. Could the app has a low listen() backlog leading to this problem?

$ netstat -s | grep -i list
    210473 times the listen queue of a socket overflowed
    210473 SYNs to LISTEN sockets ignored

Edit 2: this is my first post here so please stay with me. 🙂

Client & Server kernel: 2.6.18-92.el5

Server app: listens on sport only. It processes client request, and response back. Using strace I did find out the listen() backlog is 5.

There are 8 clients systems, each running one instance of client app. The client app sends req to the server port, get response back from server. A 10s timer is set after client succeeds in connect(). Each client instance could send multiple connection requests, each on its own client port.

There could be a burst of concurrent requests from the 8 clients.

Edit 3: The tcpdump looks very similar to the one at http://forum.openvz.org/index.php?t=msg&goto=25678 , but no root cause / solution.

Best Answer

I'd ask myself why would normal client:

  1. shrink its previously advertised data window 5840 to 46
  2. re-send the same data segment for 5 times (3 of them within 1 second!)

Also, if you need help of community, you'd better think that we've lost our telepathy abilities long ago and thus we haven't even a piece of idea what soft is involved into this (starting from Linux kernel version) and what and how it's supposed to work.