Linux – Reasons for a TCP Timeout scenario

I am actually currently investigating long running connections of a Java/Tomcat based Web application. After ruling out any internal or application based reasons, I am now down to the network layer. The reason why I am investigating this issue is that we have seemingly random spikes in our response time monitoring. While investigating, I found that this behavior is not so random at all, but triggered by certain client HTTP Requests. The special thing about those connections is that they all originate from the same IP address and seem to use a Bluecoat Proxy, because I see a x-bluecoat-via HTTP header.

As I said, the application itself performs normally, only the end of the connection (from Tomcat's point of view) seems to be somehow delayed. The server does not talk directly to the client but is behind an F5 Loadbalancer which should actually cache the answers (which might not happen because of a accept-encoding identity header and the actual response being to large for the buffer).

I got a TCP dump, due to an unfortunate mistake I currently only see packages from the LB to the appserver, not the actual packages send from the appserver.

The dump contains multiple requests on the same TCP/IP connection, which is due to connection pooling done by the F5. The last HTTP-Request on this connection is the actual connection that was flagged as long running (925836.442ms) in our logging. What I see is the request packets, a series of ACKs which leads me to believe that the appserver is writing it's answer and then finally two FIN, ACK packages followed by a RST, ACK which is the last packet send by the F5.

From a timing point of view this all happens in the course of 250ms, the last packet is send 15 Minutes and 13 Seconds before I see the response log on the appserver which is written after the response is believed to be finished by Tomcat.

I'm kind of out of Ideas at the moment and have a couple of open questions:

Is there any reason Linux would keep a connection open that has received a RST and not tell the application layer?

Is there any other timeout that could lead to this behavior? If this would be the TCP retransmit timeout, I would see more RSTs from the LB.

Any other idea why a closed connection on the wire would lead to a still open connection in the application layer?

How can something that happens in application layer (special HTTP Request) lead to a reproducible behavior in the transport layer?

Maybe I'm completely on the wrong track and this is a connection keep-alive issue inside Tomcat?

Best Answer

I can't really help on the networking layer, but on the Tomcat there are several places where you could configure that http://tomcat.apache.org/connectors-doc/reference/workers.html . You could try to overwrite the time-out and configure it to close the connection after a certain amount of time .

On the link you also have load balancer configurations which might be helpfull in your scenario .

Best Answer

Related Solutions

Linux – How passively monitor for tcp packet loss? (Linux)

How it was detected if last ACK lost in TCP connection termination procedure

Related Topic