I have a nagging Rapid PVST problem on some Nexus 9000 switches. Rapid-PVST keeps recalculating 3 to 5 times an hour. We have in this topology (summarized):
Edge Router Access Layer
+-------------+ +-------------+
| | Eth1/28 Eth1/54 | |
| Nexus9000_1 +-------------------------+ Nexus9000_2 |
| | Vlan350 | |
+-------------+ dot1q Trunk +-------------+
|Eth1/45 (dot1q trunk)
|
Something_Important
SHOW OUTPUT: Nexus9000_1
Nexus9000_1# sh spanning-tree vlan 350 detail | i from|topology|VLAN
VLAN0350 is executing the rstp compatible Spanning Tree protocol
Number of topology changes 1348 last change occurred 0:35:39 ago <---
from Ethernet1/28 <---
Times: hold 1, topology change 35, notification 2
Timers: hello 0, topology change 0, notification 0
... Output snipped ...
SHOW OUTPUT: Nexus9000_2
Nexus9000_2# sh spanning-tree vlan 350 detail | i from|topology|VLAN
VLAN0350 is executing the rstp compatible Spanning Tree protocol
Number of topology changes 1157 last change occurred 0:35:39 ago <---
from Ethernet1/54 <---
Times: hold 1, topology change 35, notification 2
Timers: hello 0, topology change 0, notification 0
... Output snipped ...
BACKGROUND
The reason I found the STP recalculations is because we got so many complaints about the device connected to Nexus9000_2 Eth1/45 having 30-ish second outages over and over again. Configuring Nexus9000_2 Eth1/45 as spanning-tree port type edge trunk
made the problem much less visible because STP moves into a forwarding state much faster with that port-type.
I checked and know that the interfaces in this diagram are not flapping.
QUESTION
Each of those switches says it received a topology change notification (TCN) from the other switch. That's not very helpful… and I don't want to band-aid the problem with spanning-tree port type edge trunk
on port Eth1/45.
What is the best way to find the root cause of these STP topology changes using the tools available on Nexus 9000 switches?
Please don't respond with show spanning-tree internal event-history all
or other show spanning-tree internal
commands without explaining what exactly to look for in those commands.
Best Answer
In my case, I was able to solve the problem by turning on these debugs on Nexus9000_2:
debug spanning-tree rstp interface eth1/54
debug spanning-tree event interface eth1/54
debug spanning-tree bpdu_rx interface eth1/54
The next time a BPDU triggered a calculation, the debug gave me detailed information on what was happening on the switchport.
The output of this command was also useful:
sh spanning-tree internal event-history all | begin VLAN0350