First of all you need to consider that the air is a shared media. In contrast to the switched wires where collisions are rare, in the air they are quite frequent and as a result retransmissions are part of the game. A lot of clients may compete on who will send a frame first to the access point and the losers have to retransmit. It gets worse when signal from one client is stronger than others and covers them, so that the access point cannot "hear" them. Finally neighbor wirelesses can also affect you if you are operating in nearby channels or you use turbo, super etc modes that utilize a wider band and are more easily affected by interferences.
I suggest that you scan the area for other wifi signals. Try to use a less congested channel.
Avoid the wide bandwidth modes, even the 11g is utilizing a wide range of frequencies to achieve the 54Mbps.
If you have a lot of local wireless clients consider expanding the access points to share the load.
And just to be safe, make sure no other client is not utilizing any heavy downloading software, like torrents.
Well. Since you've captured packets that show that your Trading Server is sending the TCP ACK's with a window size of 0, you at least know the problem is definitely on your side. Which is actually a good thing, because you are in a position to fix it. (There is one thing that might be the issue which would be a problem on their end, I'll talk about that later)
You've also traced the issue to happening during times of increased throughput, also a good thing.
You said the CPU/RAM usage on your Trading Server reported normal. The application you are using, is it by chance configured to use a limited amount of RAM on the host OS? Maybe a limited percent? Because it would stand to reason that if so, as you had more connections and more throughput, there was less RAM available to the application, and therefore less resources available for TCP.
Either way, what OS is your Trading Server using? If you haven't already, you should look into tuning the OS to dedicate more RAM to TCP. In Windows, there are Registry values you can modify. In Linux, there are config files you can edit.
It would also be wise to make sure your Firewall (and nothing else in between) is trying to proxy your TCP sessions. That way you know you are dealing with the full "client to server" TCP connection, and not something in between.
The last thing I can offer is to study the TCP packets being sent from the Stock Exchange to your server just before your server sends a Window Size of 0. In particular, look for the incoming packets to have the value 11 in the IP Header's ECN field (Explicit Congestion Notification -- the last two bits in what used to be DSCP, bits 14 and 15 if you're looking at an IP Header). There is a chance that if both the Client and Server in the communication supported ECN, and a router in transit detected congestion, that it turned these bits on to tell the client and server to slow down their transfers. (This is that thing I said that might be a problem on their end)
I think that (tries to) answer questions 0,1,3. I'll have to dig around a bit more to give you a reliable answer for 2. But I'm pretty confident there is a way.
Best Answer
Very likely the WAN connection is the major problem. There's little you can do except provide a local server or change the ISP. Additionally, the possibly weak WAN link may be congested and may need an upgrade. Also, the VPN router make be too weak (I've seen Gbit WAN links with a VPN router barely able to handle 20 Mbit/s).
In the local network, make sure routing is working correctly. While
ping
andtraceroute
(by IP) are no exact tools they can usually provide a good starting point for further research.If you've got redundant WAN/VPN links you'll want to watch out for asymmetric routes - replies coming from another link that the requests went out of. If you use asymmetric routes you need to make sure your network, devices and policies can handle them.
Next, make sure DNS is working AOK. With Windows servers you'll need to use Windows DNS or put all the required AD records on the DNS server used locally. Usually, an ISP DNS server is not a good choice. Name resolution over broadcast does not work across VPN locations.
If everything fails you'll need to run a packet trace and watch a problematic connection in detail. This requires significant insight into the protocols used and you may need to hire a consultant.
For more detailed suggestions you'd need to provide more details on your network: a diagram, (sanitized) configurations of all relevant devices and a more exact description of your problem.