Ubuntu – Server Dropping all Connections Randomly and Packet Loss

packetlosstcpUbuntuubuntu-10.04

I just built a server using a Supermicro X8DAH+-F board and running Ubuntu 10.04 Server 64bit. This has the Intel 82576 dual port controller (one port is disabled). Since this is a server, remote access is imperative.

The server is connected to a switch (DLink), and the switch is connected to a router running DD-WRT (Netgear WNR3500v2/U/L).

eth1      Link encap:Ethernet  HWaddr 00:25:90:03:c9:b9  
          inet addr:192.168.0.100  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7655 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5772 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:7179394 (7.1 MB)  TX bytes:919727 (919.7 KB)
          Memory:fbc60000-fbc80000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:637 errors:0 dropped:0 overruns:0 frame:0
          TX packets:637 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:96955 (96.9 KB)  TX bytes:96955 (96.9 KB)

I am pulling my hair out. This server randomly drops all connections. If I am logged in via SSH, the session will get disconnected between 0 mins (immediately) after login, to 30 mins. Once the connections are dropped, it takes several minutes for services to come back up.

I decided to run a 24 hour ping test from the server to the router. I have noticed that these disconnections occur during random periods of high packet loss between the NIC and the router.

The server is not overloaded with I/O processes or CPU processes and I am the only one using it.

Things I have tried to no avail.

  • Swapping cables
  • Swapping routers
  • Swapping ports on the router
  • Removing network-manager (Ubuntu)
  • disabling all firewalls
  • disabling iptables.
  • restarting all of the services manually.

I am considering buying a PCIe NIC, but I want to ask in case there is something I am overlooking.

Best Answer

One thing you might want to verify is that there are no other machine/device on the network "stealing" the server ip. Unless you can find that info in your network equipment there is always the option of running a arpwatch daemon on some suitable server on that local network.