What causes winsock 10055 errors? How should I troubleshoot

windows-server-2003

I'm investigating some issues with winsock 10055 errors on a chain of custom applications (some of which we control, some not) and was hoping to get some advice on techniques to troubleshoot the problem.

No buffer space available.
    An operation on a socket could not be performed because the system 
    lacked sufficient buffer space or because a queue was full.

From research, non-paged pool and ports seem to be the only resources which can cause this error. Is there another resource which might cause 10055 errors?

Currently, we have perfmon counters setup on the applications and non-paged pool usage looks low in most circumstances. Open TCP connections looks low and I am unaware of another way to monitor ports.

Since it only happens in production, we are unable to use more invasive counters. Though it would still be interesting to hear other solutions. I'm sure other people could use the information.

Is there some other tool or procedure you would recommend to diagnose which application is causing the issue?

UPDATE:

The platform is windows server 2003 x86 with the /3G switch. For reference, x86 generally has 256mb of NPP storage, /3G lowers it to 128mb. In general, you'd want to avoid this configuration for avoiding NPP problems. (reference)

We have source to one application. I have written pretty elaborate testing harnesses trying to reproduce the behavior to no avail.

As mentioned, the problem only happens in production. As such, packet monitoring has been avoided. We currently have performance counters setup that monitors NPP, threads, network traffic, etc. Since perfmon's interval is 1sec, you could have microbursts that come and go within that window. There is somewhat subjective evidence that this isn't the problem however.

The basic situation is that the other side of the connection says that it has closed the connection due to errors with 10055 as the code. NPP (and performance in general) looks stable prior to disconnects, which points to some other resource being the cause.

UPDATE:

I'd also reiterate that the original questions pertain to diagnosis, not solutions. I still don't have a clear answer on what causes 10055 errors. Checking drivers and hardware and reinstalling operating systems is great, but it sidesteps the original question.

Best Answer

Ram shortages can cause this also, according to Google searches, among other things. An error condition I observed via Google in finding some of below was low memory issues where the base operating system had little access to RAM. My guess is that same type of problem could easily be recreated in a virtual environment that is starved for ram.

A more fundamental troubleshooting question is quite simply - what is different with your production environment?

Have you tested the application in Windows 2003 x64 or Windows 2008?

Onto the 2nd part of your questions..

The following tools can be used for troubleshooting and fixing Winsock errors.

Sniffers:

http://www.wireshark.org/ 

Shims:

http://www.sstinc.com/winsock.html
http://www.win-tech.com/html/socktspy.htm

General purpose tools to track the system status and resources

http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx
http://technet.microsoft.com/en-us/sysinternals/bb896645

Tool to detect the API calls

http://www.apimonitor.com/
http://www.nektra.com/products/spystudio-api-monitor/

Debuggers

http://www.ollydbg.de/
http://www.immunitysec.com/products-immdbg.shtml

Reversing tools or decompilers

http://www.hex-rays.com/products/ida/index.shtml
http://www.hex-rays.com/products/decompiler/index.shtml

Your standard IDE and compiler

http://www.microsoft.com/visualstudio/en-us

Here is a list of other tools:

http://www.sockets.com/devtools.htm

Other references found:

https://stackoverflow.com/questions/8118870/howto-debug-winsock-api-calls

http://brandon.fuller.name/archives/2007/01/24/19.44.29/

http://tangentsoft.net/ <---- Probably the best one