Windows Server 2008 occasional connection drops troubleshooting

connectiondell-poweredgenetworkingwindows-server-2008

I'm seeing occasional (2 or 3 times/day) "connection drops" on a Windows Server 2008 R2 physical server, running on a Dell R710. I use the term "connection drops" because I don't know how to describe it otherwise, but I mean the following:

  • Server stops responding to ping
  • Any RDP connections (or other types of remote connections) will stall and eventually time out
  • Any connections to the SQLS database or IIS running on this server will stall/time out

This seems to last anywhere from 30 seconds to 1 minute. After that, the server comes back up, responds to ping and just resumes all of its services as if nothing ever happened.

This server runs the following services:

  • SQL Server 2005 database (2 databases and reporting)
  • IIS7 web server (running 2 custom services and 1 reporting site)

Obviously, I'd like to find out what is causing this. There is nothing in the server's event logs or other monitoring parameters that I can see that indicates any issue in particular. Any tips on how to try to narrow down what is causing this issue?

It's worth taking into account the following facts:

  • We have 5 other servers (of which 3 R410's) running in the same rack, on the same network, none of which seem to display this issue
  • The handles count from the performance view in process manager sits around 40,000 handles (of which lsass.exe seems to take ~7000)
  • I've tried to restart the IIS to see if the custom services are somehow causing this; this means I shouldn't see this issue in the next couple of days/weeks

Update 1: DRAC is still accessible when this issue occurs. This is a very strange issue. I think we'll have to trial & error this by trying various solutions and checking the results.

Update 2: I have spoken to the network guys, and they confirmed that for some reason our server's MAC address is repeatedly being removed from the switch's ARP table. The exact cause of this is yet unknown (it could be a dodgy cable connecting the server to the switch, or the NIC going to sleep all the time). We've started a continuous ping to the default gateway, and are looking to replace the cable.

Best Answer

If you're using multiple NIC's on this machine the make sure that you only have one default gateway defined.

We had a problem like this recently and it transpired that the NIC's used for the back end networks (192.168.x.x) had default gateway's specified.

Related Topic