Cisco – Finding the root cause of Spanning-Tree recalculations (on Cisco Nexus 9000s)

ciscocisco-nexusspanning tree

I have a nagging Rapid PVST problem on some Nexus 9000 switches. Rapid-PVST keeps recalculating 3 to 5 times an hour. We have in this topology (summarized):

  Edge Router                             Access Layer
+-------------+                         +-------------+
|             | Eth1/28         Eth1/54 |             |
| Nexus9000_1 +-------------------------+ Nexus9000_2 |
|             |         Vlan350         |             |
+-------------+       dot1q Trunk       +-------------+
                                               |Eth1/45 (dot1q trunk)
                                               |
                                      Something_Important

SHOW OUTPUT: Nexus9000_1

Nexus9000_1# sh spanning-tree vlan 350 detail | i from|topology|VLAN
 VLAN0350 is executing the rstp compatible Spanning Tree protocol
  Number of topology changes 1348 last change occurred 0:35:39 ago <---
          from Ethernet1/28                                        <---
  Times:  hold 1, topology change 35, notification 2
  Timers: hello 0, topology change 0, notification 0
  ... Output snipped ...

SHOW OUTPUT: Nexus9000_2

Nexus9000_2# sh spanning-tree vlan 350 detail | i from|topology|VLAN
 VLAN0350 is executing the rstp compatible Spanning Tree protocol
  Number of topology changes 1157 last change occurred 0:35:39 ago <---
          from Ethernet1/54                                        <---
  Times:  hold 1, topology change 35, notification 2
  Timers: hello 0, topology change 0, notification 0
  ... Output snipped ...

BACKGROUND

The reason I found the STP recalculations is because we got so many complaints about the device connected to Nexus9000_2 Eth1/45 having 30-ish second outages over and over again. Configuring Nexus9000_2 Eth1/45 as spanning-tree port type edge trunk made the problem much less visible because STP moves into a forwarding state much faster with that port-type.

I checked and know that the interfaces in this diagram are not flapping.

QUESTION

Each of those switches says it received a topology change notification (TCN) from the other switch. That's not very helpful… and I don't want to band-aid the problem with spanning-tree port type edge trunk on port Eth1/45.

What is the best way to find the root cause of these STP topology changes using the tools available on Nexus 9000 switches?

Please don't respond with show spanning-tree internal event-history all or other show spanning-tree internal commands without explaining what exactly to look for in those commands.

Best Answer

In my case, I was able to solve the problem by turning on these debugs on Nexus9000_2:

  • debug spanning-tree rstp interface eth1/54
  • debug spanning-tree event interface eth1/54
  • debug spanning-tree bpdu_rx interface eth1/54

The next time a BPDU triggered a calculation, the debug gave me detailed information on what was happening on the switchport.

The output of this command was also useful: sh spanning-tree internal event-history all | begin VLAN0350

Related Topic