How to diagnose intermittent connectivity issues

monitoringnetwork-monitoringnetworking

At work we are suffering some intermittent faults with the internet. This manifests itself as none of our desktop computers being able to make any requests externally including to ip addresses for a few minutes. We were blaming our ISP but on the most recient outage, I connected to one of our servers via ssh and realised it was able to make external ping requests.

Our network infrastructure is as follows.

VDSL GATEWAY
     |
     |
     |
WIRELESS ROUTER-------------------------------------------
     |                             |                     |
     |                             |                     |
Active directory Server          SERVER                Switch
                                                      | | | | |
                                                      Desktop Computers

The VDSL gateway is configured over PPPoE on the Wireless router.

Facts

  • During last outage SERVER marked above was able to continue making requests it was able to continue pinging google. (Should imply VDSL gateway and Wireless router are working)
  • All desktop computers lose their connectivity but I could make requests to SERVER which implies that the switch is working correctly.
  • Outage only lasts short periods of time.
  • Wireless devices also lose connectivity indicating that the problem on the router is in the WIRELESS Router or upstream.
  • All machines get their DNS trhough the Active Directory Server. However the problem occurs with direct IP requests so DNS should be working.
  • SERVER is running CENTOS
  • Desktop machines are a mixture of Windows (predominently), Apple Mac, and 1 installation of UBUNTU)
  • When the network goes down we lose VPN connectivity.

I don't have any traceroute data at the moment.

How should I go about diagnosing the issues seen on the network? Ideally I'd like to be able to monitor exactly when the issues occur (log ping requests?) from at least my machine ubuntu and the centos installation. Maybe running a traceroute when external ping fails.

The network is configured to use IPv4.
The network settings have Wireless Router set as public gateway.

Current tehories
– The SERVER goes through a different route.
– I'm mad and the situation makes no sense.

Other things to note all of the desktop machines connect over one ethernet socket on wireless router via a switch.

Best Answer

Have you checked dmesg on the server. It sounds like you may have a hardware / driver issue on your internal network card OR maybe something like the connection tracking table getting too full.

I have also seen this where the internatl network is flooded due to misconfiguration or malware on a computer. If the lights on the switch flash like an insane christmas tree - this might be your problem.

iptraf is a really useful to install and monitor each of the interfaces. From the sounds of things, you should monitor the internal interface and see what the activity looks like. This might point you in the right direction.

Good luck.

Related Topic