Centos – How to diagnose large number of TIME_WAIT connections

centosnetworkingsockettcpdumptime-wait

We have a production issue with only one of our servers and have correlated slow performance to an abundance of sockets in the TIME_WAIT state. Without drawing this question into a huge backstory, we basically know that every time the server is slow, about 80% of the server's sockets are in this TIME_WAIT state, which of course we see by running a netstat). Specifically, because TIME_WAIT times out and go away, when our server is slow we see these TIME_WAITs crop up very frequently (about ever 5 – 10 minutes).

I did a little digging and see that TIME_WAITs occur when the server closes an active connection but keeps it around in case any delayed packets come through. Eventually TIME_WAIT times out.

Anyway to see exactly why an individual socket went into the TIME_WAIT state to begin with? This is CentOS 5 – does Linux log this info in var/logs anywhere, or is there any way to do a tcpdump and look for a specific pattern that leads to a TIME_WAIT? Thanks in advance.

Best Answer

Short answer - it is due to an app. The app creates sockets for a short time , closes them, then it immediately needs to open another socket. The sluggishness is related to the process(es) running out of sockets to use.

When creating a socket there are options - SO_REUSEADDR abnd SO_REUSEPORT. They have somewhat similar functions, but I suspect in Centos 5 SO_REUSEPORT is not available. Anyway, the optional setting on a socket call allows the port to be immediately reused.

So, a commonly used fix is to recode. It is probably a net app that connects for a few seconds then ends the session.