We had this exact same problem. Just disabling TCP timestamps solved the problem.
sysctl -w net.ipv4.tcp_timestamps=0
To make this change permanent, make an entry in /etc/sysctl.conf
.
Be very careful about disabling the TCP Window Scale option. This option is important for providing maximum performance over the internet. Someone with a 10 megabit/sec connection will have a suboptimal transfer if the round trip time (basically same as ping) is more than 55 ms.
We really noticed this problem when there were multiple devices behind the same NAT. I suspect that the server might have been confused seeing timestamps from Android devices and OSX machines at the same time since they put completely different values in the timestamp fields.
So, this is a neat question.
Initially, I was surprised that you saw any connections in SYN_RECV state with SYN cookies enabled. The beauty of SYN cookies is that you can statelessly participate in the in TCP 3-way handshake as a server using cryptography, so I would expect the server not to represent half-open connections at all because that would be the very same state that isn't being kept.
In fact, a quick peek at the source (tcp_ipv4.c) shows interesting information about how the kernel implements SYN cookies. Essentially, despite turning them on, the kernel behaves as it would normally until its queue of pending connections is full. This explains your existing list of connections in SYN_RECV state.
Only when the queue of pending connections is full, AND another SYN packet (connection attempt) is received, AND it has been more than a minute since the last warning message, does the kernel send the warning message you have seen ("sending cookies"). SYN cookies are sent even when the warning message isn't; the warning message is just to give you a heads up that the issue hasn't gone away.
Put another way, if you turn off SYN cookies, the message will go away. That is only going to work out for you if you are no longer being SYN flooded.
To address some of the other things you've done:
net.ipv4.tcp_synack_retries
:
- Increasing this won't have any positive effect for those incoming connections that are spoofed, nor for any that receive a SYN cookie instead of server-side state (no retries for them either).
- For incoming spoofed connections, increasing this increases the number of packets you send to a fake address, and possibly the amount of time that that spoofed address stays in your connection table (this could be a significant negative effect).
- Under normal load / number of incoming connections, the higher this is, the more likely you are to quickly / successfully complete connections over links that drop packets. There are diminishing returns for increasing this.
net.ipv4.tcp_syn_retries
: Changing this cannot have any effect on inbound connections (it only affects outbound connections)
The other variables you mention I haven't researched, but I suspect the answers to your question are pretty much right here.
If you aren't being SYN flooded and the machine is responsive to non-HTTP connections (e.g. SSH) I think there is probably a network problem, and you should have a network engineer help you look at it. If the machine is generally unresponsive even when you aren't being SYN flooded, it sounds like a serious load problem if it affects the creation of TCP connections (pretty low level and resource non-intensive)
Best Answer
The initial retransmission timeout setting is hardcoded in the kernel to be 1 second in modern versions: https://elixir.bootlin.com/linux/v5.9.11/source/include/net/tcp.h#L142
The constant is referenced in
tcp.c
: https://elixir.bootlin.com/linux/v5.9/source/net/ipv4/tcp.c#L420You cannot change it without recompiling the kernel: How can I tune the initial TCP retransmit timeout? (it seems that it used to be 3 seconds in older versions)