How to troubleshoot excessive latency between client and server

latency

On a large private network there seems to be excessive latency between two boxes which I control. This is stopping the server providing a web service to the client as its causing timeouts.

Here's a slow running request logged in Charles

Timing  
Request Start Time  03/11/10 23:21:33
Request End Time    03/11/10 23:21:33
Response Start Time 03/11/10 23:21:42
Response End Time   03/11/10 23:21:42
Duration    8.99 sec
Request Duration    16 ms
Response Duration   0 ms
Latency 8.97 sec
Speed   1.30 KB/s
Response Speed  ∞ KB/s
Size    
Request Header Size 412 bytes
Response Header Size    151 bytes
Request Size    -
Response Size   11.17 KB (11436 bytes)
Total Size  11.72 KB (11999 bytes)
Request Compression -
Response Compression    - 

As a test I just tried getting an 423k image file via a browser, but this downloads so slow it stalls.

How can I troubleshoot where the problem lies?

I tried using pingplotter
alt text

Looks like hops 3 and 4 are the culprits? Where do I go from here?

ANSWER

This turned out to be a problem with the client box running Windows Web Server 2008 R2 and a known network speeds problem if the router is not able to handle network TCP window scaling. Updated router software and problem was resolved.

Best Answer

Attempt to replicate the issue from as many different points as possible. You'll probably find something in common between them (be it the server they are using, a switch, etc). Collect traceroute data from each endpoint.

It's always better to use the onion approach: add/remove layers. You can start form the client and come closer and close to the server. Or the inverse.

Make sure to check the interface statistics for error counts. I find it very useful to fire up tcpdump/Wireshark on both ends and capture the packets from the TCP session, then compare both. Wireshark does a good job of pointing out the most obvious problems (like checksum errors or retransmissions).