Windows – Troubleshooting Network Speeds — The Age Old Inquiry

network-speednetworkingtcpwide-area-networkwindows

I'm looking for help with what I'm sure is an age old question. I've found myself in a situation of yearning to understand network throughput more clearly, but I can't seem to find information that makes it "click"

We have a few servers distributed geographically, running various versions of Windows. Assuming we always use one host (a desktop) as the source, when copying data from that host to other servers across the country, we see a high variance in speed. In some cases, we can copy data at 12MB/s consistently, in others, we're seeing 0.8 MB/s. It should be noted, after testing 8 destinations, we always seem to be at either 0.6-0.8MB/s or 11-12 MB/s. In the building we're primarily concerned with, we have an OC-3 connection to our ISP.

I know there are a lot of variables at play, but I guess I was hoping the experts here could help answer a few basic questions to help bolster my understanding.

1.) For older machines, running Windows XP, server 2003, etc, with a 100Mbps Ethernet card and 72 ms typical latency, does 0.8 MB/s sound at all reasonable? Or do you think that slow enough to indicate a problem?

2.) The classic "mathematical fastest speed" of "throughput = TCP window / latency," is, in our case, calculated to 0.8 MB/s (64Kb / 72 ms). My understanding is that is an upper bounds; that you would never expect to reach (due to overhead) let alone surpass that speed. In some cases though, we're seeing speeds of 12.3 MB/s. There are Steelhead accelerators scattered around the network, could those account for such a higher transfer rate?

3.) It's been suggested that the use SMB vs. SMB2 could explain the differences in speed. Indeed, as expected, packet captures show both being used depending on the OS versions in play, as we would expect. I understand what determines SMB2 being used or not, but I'm curious to know what kind of performance gain you can expect with SMB2.

My problem simply seems to be a lack of experience, and more importantly, perspective, in terms of what are and are not reasonable network speeds. Could anyone help impart come context/perspective?

Best Answer

The mathematical formula you are referring to is actually the way to determine the most efficient transmit window size settings for TCP, not the actual bandwidth available. TCP uses a mechanism called sliding windows that allows for adjustment of transmit speeds based on network conditions. The idea is that a TCP transmitter will send more and more data without requiring an acknowledgement from the receiver. If there's a loss of data then the amount of data sent between acknowledgements decreases, thus also decreasing the effective bandwidth.

The formula in question actually determines the ideal sizing of that TCP transmit window based on the latency and round-trip latency between a given pair of hosts. The idea is to have a window sized such that the amount of data 'in flight' corresponds to what's known as the bandwidth-delay product. For example, if you have 50 megabits per second (6.25 megaBYTES) and an average round-trip latency of 100ms then you'd have 6.25 * 0.1 = 625 kilobytes of data. This would be the value that TCP would negotiate (if configured correctly). As the latency and bandwidth characteristics of your links varies then so too does the window size.

What you need is a bandwidth management tool like iperf (free) running on both the source and your various destinations. This should give you an idea of the actual amount of throughput possible (independent of other apps) while also providing some insight into latency. Running an extended ping between hosts will also provide a general idea of latency characteristics. When you have this data you'll have a better idea of what you should be seeing as far as throughput goes.

BTW - The use of any kind of LAN optimizer will often incorporate data compression, TCP optimization, caching, etc.. While handy, it can obscure the nature of the underlying links. Once you have an idea of the raw bandwidth / delay (and packet loss, potentially) you can take a closer look to make sure your various hosts are set up to take proper advantage of available bandwidth.