Linux – TCP Source Port Reuse (and delay)

linuxporttcpwindows

We have an application making a lot of calls to a remote web server/service. Client app is JBoss/Java on Linux (Red Hat 5), remote server is Windows 2008. There is a Cisco ACE in the way but there is no NAT'ing going on.

We'v been noticing, when Linux/JBoss reuses a source port to make the HTTP call, we can get "Connection refused". It's when the client reuses said source port within a couple minutes.

What I see when running tcpdump/wireshark on both sides, is something like this:

Request#1: Source port 6666, Destination port 80

Client –> Syn
Server –> Syn ACK
Client –> ACK
Client –> GET /
Server –> Returns data
Client –> ACK
Server –> FIN ACK
Client –> FIN ACK
SERVER –> ACK

Request#2: Same source and destination port is a success.

Request#3: Same source and destination port is a success.

Request#4: Same source and destination port but this time a failure ("Connection refused") and it looks like this:

Client –> SYN
Client –> SYN (retransmission)
Client –> RST, ACK

The server sees both SYN's but never sends an ACK or the RST (it sees the RST from the client).

After doing some searching I came across a potential issue with TCP Timestamps. We made sure the ACE let's those through and I can verify Windows is seeing them. I also see the connection in a TIME_WAIT state on the server/Windows side (but I see this even after the 1st successful GET and #2 and #3 and all of those were successfull). The port is not open or in a TIME_WAIT on the client side.

One place I've been looking at is to decrease the TcpTimedWaitDelay registry entry on the Windows side to 30 seconds. I haven't done this or tested this yet but thinking it should work if our problem is there.

I've increased ports on the client/Linux side to like 15000 – 60000 (from the default of 30000 to 60000) but to no avail (just hoping that increase in available ports would translate into longer, delayed time due to randomness in using a source port)

I find it odd that the server/Windows side is seeing the SYN's come through yet not responding, making me think it thinks that SYN is from the previous session or something.

I'm not sure I would like this but I was wondering if there's a way to tell Linux to NOT reuse a source port if it had been recently used? Like some sort of delay in that logic (if there is one)?

It's not like we're running out of available ports or anything but sometimes, because it's random, a source port does get reused within a couple minutes and that's when we see issues.

You all have any other thoughts on this?

Thanks!

UPDATE

I set TcpTimedWaitDelay to 30 seconds on the Windows server. As long as a call with the source port being reused comes after that 30 seconds there are no problems.

I believe the ACE is still to blame in some regards and it may be some sort of SYN attack protection (the ACE is a security device) as if I bypass the ACE I have no issues.

But having 2MSL set to 30 seconds seems to be a good enough fix for now.

Best Answer

I don't know enough about Windows socket cycling to answer this, but I am guessing the server closes connections and sockets are sitting in TIME_WAIT state where they cannot be used again until they expire.

The "right" way to solve this problem is to add more tuples - increase outgoing ports on the client (which you have done), add listening ports on the server, add IP aliases onto your interfaces, and make the apps use those additional IPs/ports.

A "less right" way is to decrease the TW timeout, which I think you are doing with TcpTimedWaitDelay.

A "not very right at all but still quite popular" way is to enable socket recycling, Linux has options tw_reuse and tw_recycle, maybe Windows has an equivalent.

The last two options break the TCP RFC. Maybe the ACE has a problem with doing that?