Cisco Switch – Diagnosing a Potentially Bad Port

ciscocisco-catalysthardwarenetworkingswitch

I've been chasing a packet-loss and network stability issue for a handful of end-users on an internal network for the past few days… These issues surfaced last week, however the location was struck by lightning six weeks ago.

I was seeing 5-10% packet loss between a stack of four Cisco 2960's and several PC's and phones on the other side of a 77-meter run. The PC's were run inline with the phones over a trunked link (switchport configuration pastebin). We were seeing dropped calls and interruptions in client-server applications and Microsoft Exchange connectivity.

I tried the usual troubleshooting steps remotely, having a local technician do the following during breaks in user and production activity:

  • change cables between the wall jack and device.
  • change patch cables between the patch panel and switch port(s).
  • try different switch ports within the 2960 stack.
  • change end-user devices with known-good equipment (new phones, different PC's).
  • clear switch port interface counters and monitor incrementing errors closely. (Pastebin output of sh int)
  • Pored over the device logs and Observium RRD graphs. No link up/down issues from the switch side.
  • change power strips on the end-user side.
  • test cable runs from the Cisco 2960 using test cable-diagnostics tdr int Gi4/0/9 (clean)*
  • test cable runs with a Tripp-Lite cable tester. (clean)
  • run diagnostics on the switch stack members. (clean)

In the end, it took three changes of switch ports to find a stable solution. The only logical conclusion is that a few Cisco 2960 switch ports are bad or flaky… Not dead, but not consistent in behavior either. I'm not used to seeing individual ports die in this manner.

What else can I test or check to determine if these devices are bad?

What is the best-practices approach to verifying this?

Is it common for single ports to have problems, rather than a contiguous bank of ports?


BTW – show cable-diagnostics tdr int Gi4/0/14 is very cool…

Interface Speed Local pair Pair length        Remote pair Pair status
--------- ----- ---------- ------------------ ----------- --------------------
Gi4/0/14  1000M Pair A     79   +/- 0  meters Pair B      Normal              
                Pair B     75   +/- 0  meters Pair A      Normal              
                Pair C     77   +/- 0  meters Pair D      Normal              
                Pair D     79   +/- 0  meters Pair C      Normal              

Best Answer

While banks of ports often share an ASIC, each has to have its own separate PHY. If the PHY has been damaged it could very have a problem while its neighbors don't.

That said, output drops are an odd symptom for a physical problem - not impossible, but not typical. Notwithstanding half duplex links, output drops usually have more to do with buffer exhaustion than physical problems.

You may get more information by setting up a packet capture on the other side of the wire. A bad PHY would be expected to manifest with some number of physical layer errors (bad CRC, runt/giant, etc) on one or both sides of the link.

All in all it sounds like you've eliminated enough that it may be past the point of diminishing returns. I'd recommend an RMA if you have a contract.

Related Topic