So, as you've found out, TCP congestion control is a pretty complicated area.
For this particular case, because of the small requests, you're going to want to try to keep the connections open as much as possible, because one connection per request is going to take five packets each, whereas you can get the average down to a little more than two packets if you keep connections around.
NODELAY is the right thing for a game server; you want your 256 bytes delivered right away, and that's not a whole segment, so Nagle will pause unless you use NODELAY.
If your servers have loads of memory, the memory options are no big deal, new kernels have them right.
As for congestion control algorithms, you spotted Westwood. The other option is CUBIC. You can just go with one, or you can do some research and benchmark them. That could be quite a bit of work, but for 10M clients it's worth it. So, I'd be looking in to running a simulation using a traffic generator on a Mac or three (since they have the same TCP implementation as the phone), a Linux box in between acting as a router (more about this shortly) and one of your servers, to see how it goes.
Now, that middle Linux box should run ns-3 so you can simulate a more complicated path than just an ethernet switch. You then capture some packet traces on the sending end of the TCP connections, and analyse them with tcptrace or the tcptrace graphing modes of wireshark. The tcptrace documentation is a good introduction to analysing TCP congestion behaviour.
For a general sense of the scale of your problem netstat -s
will track your total number of retransmissions.
# netstat -s | grep retransmitted
368644 segments retransmitted
You can aso grep for segments
to get a more detailed view:
# netstat -s | grep segments
149840 segments received
150373 segments sent out
161 segments retransmitted
13 bad segments received
For a deeper dive, you'll probably want to fire up Wireshark.
In Wireshark set your filter to tcp.analysis.retransmission
to see retransmissions by flow.
That's the best option I can come up with.
Other dead ends explored:
- netfilter/conntrack tools don't seem to keep retransmits
- stracing
netstat -s
showed that it is just printing /proc/net/netstat
- column 9 in /proc/net/tcp looked promising, but it unfortunately appears to be unused.
Best Answer
Make sure you're running with a sufficiently low MTU, if possible. A single 1500-byte packet takes ~6s to transmit on your link (presuming you meant 2 kilobits per second, not bytes). And you're losing a fair number of them (probably more than 5%, if that is your packet loss with
ping remote-end
, notping -s «MAX-SIZE» remote-end
), requiring resending the entire packet.Strictly speaking, IPv4 can go down to an MTU of 68 (which is too small anyway), but Linux's PMTU discovery is limited to no smaller than 552, and possibly other parts of the stack fail below 128 bytes or so.
Note that you are operating at well under a tenth of the bandwidth TCP's designers had back in 1973.