There a lot of determining factors that go into the performance feeling here.
One tweak you might consider is setting up Jumbo Frames. Scott Lowe has a recent blog post here that shows some of what he did to achieve this.
You mention that the guests will be running low CPU load - those are always great candidates for virtualization - but the difference between FiberChannel and iSCSI doesn't really come into play yet.
If your vm guests are going to be running storage-intensive operations, then you have to consider that the speed of transferring a read/write operation from the VM Host to the storage array may become your bottleneck.
Since your typical iSCSI transfer rate is 1Gbps (over Ethernet), and FC is usually around 2-4Gbps (depending on how much cash you're willing to spend), then you could state that the transfer speed of FC is roughly twice as fast.
There's also the new 10Gig-E switches, but your Powervault and Powerconnect don't support that yet.
However, that doesn't really mean that the machines will work faster, as if they are running applications with low I/O, they might actually perform at the same speeds.
The debate as to which is better is never ending, and it basically will depend on your own evaluation and results.
We have multiple deployments of FC-based mini-clouds and iSCSI-based mini-clouds, and they both work pretty well. We're finding that the bottleneck is at the storage array level, not iSCSI traffic over 1Gb Ethernet.
So, as you've found out, TCP congestion control is a pretty complicated area.
For this particular case, because of the small requests, you're going to want to try to keep the connections open as much as possible, because one connection per request is going to take five packets each, whereas you can get the average down to a little more than two packets if you keep connections around.
NODELAY is the right thing for a game server; you want your 256 bytes delivered right away, and that's not a whole segment, so Nagle will pause unless you use NODELAY.
If your servers have loads of memory, the memory options are no big deal, new kernels have them right.
As for congestion control algorithms, you spotted Westwood. The other option is CUBIC. You can just go with one, or you can do some research and benchmark them. That could be quite a bit of work, but for 10M clients it's worth it. So, I'd be looking in to running a simulation using a traffic generator on a Mac or three (since they have the same TCP implementation as the phone), a Linux box in between acting as a router (more about this shortly) and one of your servers, to see how it goes.
Now, that middle Linux box should run ns-3 so you can simulate a more complicated path than just an ethernet switch. You then capture some packet traces on the sending end of the TCP connections, and analyse them with tcptrace or the tcptrace graphing modes of wireshark. The tcptrace documentation is a good introduction to analysing TCP congestion behaviour.
Best Answer
That is possible. It's called IPFC and RFC 2625 specifies it. You have to have an TCP/IP stack for your HBA. Tell more details, which HBAs and Switches etc.