So, as you've found out, TCP congestion control is a pretty complicated area.
For this particular case, because of the small requests, you're going to want to try to keep the connections open as much as possible, because one connection per request is going to take five packets each, whereas you can get the average down to a little more than two packets if you keep connections around.
NODELAY is the right thing for a game server; you want your 256 bytes delivered right away, and that's not a whole segment, so Nagle will pause unless you use NODELAY.
If your servers have loads of memory, the memory options are no big deal, new kernels have them right.
As for congestion control algorithms, you spotted Westwood. The other option is CUBIC. You can just go with one, or you can do some research and benchmark them. That could be quite a bit of work, but for 10M clients it's worth it. So, I'd be looking in to running a simulation using a traffic generator on a Mac or three (since they have the same TCP implementation as the phone), a Linux box in between acting as a router (more about this shortly) and one of your servers, to see how it goes.
Now, that middle Linux box should run ns-3 so you can simulate a more complicated path than just an ethernet switch. You then capture some packet traces on the sending end of the TCP connections, and analyse them with tcptrace or the tcptrace graphing modes of wireshark. The tcptrace documentation is a good introduction to analysing TCP congestion behaviour.
You're very nearly there, but it's possible that you've been cargo-culting someone else's work, possibly on ssh rate-limiting, without really understanding it. Please note that I'm not criticising you: building on other people's work is an excellent idea in the free software community; but you should understand why they've done what they've done, so you don't fail to use it correctly.
I set up a test rig, using nc
(netcat) to flood UDP traffic from a machine called bill to a machine called risby with the following lines:
risby% nc -l -u 12345
bill% seq 1 10000000 | nc -u risby 12345
This produced a very-rapidly increasing list of numbers from risby's netcat, much like the command-flooding you've been having.
But when I created two new rules for risby's iptables which filtered only UDP traffic to port 12345 without regard for state, it worked fine:
iptables -I INPUT 1 -p udp --dport 12345 -m recent --set --name ddos
iptables -I INPUT 2 -p udp --dport 12345 -m recent --rcheck --seconds 1 --hitcount 5 --name ddos -j DROP
When I re-ran the netcats, the first few packets from bill got through on risby, and the numbers climbed rapidly to about 1800, but then it stalled completely and no further traffic was received from bill.
Note that it's quite important that these rules come early in your iptables INPUT chain, which is why I've inserted them at lines 1 and 2 respectively.
Edit:
Increase the rate, and require it to be sustained for longer; perhaps --seconds 10 --hitcount 50
? Eventually you'll reach a threshold where few legitimate clients are affected, but the DDoS is still substantially throttled. Note that friendly-fire is always a possibility in this kind of layer-3 throttling; my own ssh server limits new connections to two per 60s window, which makes repeated scps quite slow. But it's a price I'm willing to pay, and to do better requires layer-4 throttling, which means the application has to be throttling-aware. iptables can't help you there.
I note that --hitcount
can take no value higher than the ip_pkt_list_tot
parameter of the xt_recent kernel module, and if the value's exceeded an error is thrown at rule-creation time:
[root@risby scratch]# iptables -A INPUT -p udp -m recent --rcheck --seconds 1 --hitcount 50 --name ddos -j DROP
iptables: Invalid argument. Run `dmesg' for more information.
But this value can be set at up to 255 at module insertion time. Following the suggestions in this blog entry, it's possible to reload the module, setting the parameter explicitly:
[root@risby scratch]# rmmod xt_recent
[root@risby scratch]# modprobe xt_recent ip_pkt_list_tot=100
[root@risby scratch]# iptables -A INPUT -p udp -m recent --rcheck --seconds 1 --hitcount 50 --name ddos -j DROP
[root@risby scratch]#
Note how the --hitcount 50
no longer causes errors. You may need to flush the INPUT
chain (iptables -F INPUT
) and any other chains that use the recent
module before you can remove and reinsert the xt_recent
module.
Best Answer
As you are stating that you "need the port", I assume that you are offering some kind of public service on the DoS-attacked UDP port. Using HAProxy would not help you as HAProxy
The available options depend on the characteristics of the "bad" packets.
If you could identify them out based on the header information (IP source/destination address, UDP source/destination port), your best bet would be to ask your ISP to filter packets matching the appropriate criteria.
If you need content inspection or state matching, the ISP probably is not going to be able to help (although it would not hurt to ask) but would need to set up an own packet filtering router able to filter by the defined criteria. pf/BSD as well as netfilter/Linux would likely be able to do the job.