TCP Window Size Allocation and TCP Zero Window Errors

monitoringtcpwireshark

Greetings Everyone!

My mind is about to explode as everyone is always blaming the network guys for disconnections and slowness in the networks, but Solarwinds Reports that all utilization in the network devices and links are Okay. So I tried sniffing then I got "Zero Window" errors. So If I may ask:

1.) How is TCP Window Size Allocated? (Is it per TCP conversation like if you have 1 Application(Mozilla Firefox) and 5 tabs open, then the OS allocates Window Size per tab?)

2.) What causes and How do you fix a "TCP Zero – Window" issue? (The Stock Trading Server is the one who is having a hard time processing burst traffic and sending the TCP Zero window messages to the traders but based on the Network Utilization(CPU, Memory and Link Utilization of Network Devices) in Solarwinds and Performance Monitoring(CPU, Disk Space, Memory, NIC Utilization) in both the Stock Trading Server and Database Server, it shows that is Perfectly Normal and even under-utilized!.)

3.) Is it perhaps in the Trading Server's Settings? (32 GB Memory but only uses the default tcp window allocation size of 64 MB)

4.) Or is there something wrong with how slow the Trading Application process the data? (I am planning to increase the TCP Buffer Size from 64KB to probably 256KB but it might not help if the the Trading Application Server itself process the data slowly.)

5.) Also, all the traders are experiencing "Unable to Connect to Trading Server" and "Intermittent Connections" errors. (but there's no report of network problem like "down links" or "fully utilized links". I've even tried to change the polling data to every 1 minute to capture short disconnections but I still see no problem) So I think that there might be a latency problem

6.) How do you measure Latency of Network Communication efficiently? What Free and Paid Software Solutions do you recommend? (Traceroute reports 4ms and even if i increase the ping packet to 1mb, it also shows 1-3ms delay so I don't think that's helpful)

7.) How do you sort out each TCP thread/conversations if the source port and destination port are the same and the data is encrypted? (Like if the Stock Trading Server and the SQL Server talks on the same port numbers but has multiple transactions going on.)

Sorry, I'm just new to the networking world so there are a lot of stuff I don't know and can't find in books and other resources. I think this kind of things are learned through experience so please share your wisdom.

Thank you and Have a Good Day! 🙂

Best Answer

Well I think you are in luck, as there is a wireshark forum convo that addresses this completely, and describes your situation.

https://ask.wireshark.org/questions/2365/tcp-window-size-and-scaling

Basically, it's not the network, it's more likely the server your traders are all trying to access. The server can't process the packets it's getting at the rate it's getting them (i.e., drinking from the firehose) and thus that message is the result.

Asking about tools, well, wireshark ;)

You need networking monitoring for long-term bandwidth and error graphing/tracking. Cacti and Nagios (or Icinga) are free applications that run on any unix/linux platform. For Windows, you're better off buying something from SolarWinds (you work for a trading firm so they should be able to spend a small amount on this); What's Up Gold would be a decent starting point. Not from SolarWinds but very good is PRTG.

For immediate tools, read more on ping and traceroute in windows. The defaults for those commands are absolutely horrible, and tweaking them can give you better and faster results. If you have a unix/linux system available, mtr is the trace/path-performance tool-of-choice for network administrators.