Cisco – How to anticipate spanning-tree changes to prevent packet loss during convergence

ciscopacketlossspanning tree

We have a LAN with Cisco switches, redundant cabling and spanning tree.
If I understand it correctly, when I pull out a redundant cable (that is currently "used" by the spanning tree) it takes several seconds until the spanning tree converges in reaction. How can I prevent this packet loss (assuming of course I know beforehand that the cable will be pulled)? That is, how can I make the spanning tree adapt "proactively"?

I would have guessed that an interface shutdown plus waiting a couple of seconds should suffice, but did not dare to try that out yet. Actually, I am afraid an interface shutdown would cause the same interruption times during convergence because I suffered from such an interruption yesterday when makeing a supposedly harmless configuration change at some interfaces. (Edit: I just confimed this experimentally; as expected there was some 20 seconds of interruption after interface shutdown – note that I am looking for a "lossless" soluiton, not just "less loss")

Best Answer

It sounds like you're using class STP instead of rapid STP. Two options will speed up the convergence time significantly.

interface *server interface*
spanning-tree portfast

This should be applied to server interfaces. It will tell STP that there is no switch on the other side of this port, and that it is safe to skip the normal "safe" method of preventing loops. The port should move straight to forwarding.

spanning-tree mode rapid-pvst

Enables the newer Rapid Per-VLAN Spanning Tree protocol, which uses messages between switches to enable re-convergence within a couple of seconds rather than 30-45.

You might try setting up a port-channel between your switches instead of redundant single links. This would allow all traffic to fail over to the remaining port if one is lost.

Related Solutions

How to diagnose packet loss

I am a network engineer, so I'll describe this from my perspective.

For me, diagnosing packet loss usually starts with "it's not working very well". From there, I usually try to find kit as close to both ends of the communication (typically, a workstation in an office and a server somewhere) and ping as close to the other end as possible (ideally the "remote end-point", but sometimes there are firewalls I can't send pings through, so will have to settle for a LAN interface on a router) and see if I can see any loss.

If I can see loss, it's usually a case of "not enough bandwidth" or "link with issues" somewhere in-between, so find the route through the network and start from the middle, that usually gives you one end or the other.

If I cannot see loss, the next two steps tend to be "send more pings" or "send larger pings". If that doesn't sort give an indication of what the problem is, it's time to start looking at QoS policies and interface statistics through the whole path between the end-points.

If that doesn't find anything, it's time to start question your assumptions, are you actually suffering from packet loss. The only sure way of finding that is to do simultaneous captures on both ends, either by using WireShark (or equivalent) on the hosts or by hooking up sniffer machines (probably using WireShark or similar) via network taps. Then comes the fun of comparing the two packet captures...

Sometimes, what is attributed as "packet loss" is simply something on the server side being noticeably slower (like, say, moving the database from "on the same LAN" to "20 ms away" and using queries that requires an awful lot of back-and-forth between the front-end and the database).

Linux – How passively monitor for tcp packet loss? (Linux)

For a general sense of the scale of your problem netstat -s will track your total number of retransmissions.

# netstat -s | grep retransmitted
     368644 segments retransmitted

You can aso grep for segments to get a more detailed view:

# netstat -s | grep segments
         149840 segments received
         150373 segments sent out
         161 segments retransmitted
         13 bad segments received

For a deeper dive, you'll probably want to fire up Wireshark.

In Wireshark set your filter to tcp.analysis.retransmission to see retransmissions by flow.

That's the best option I can come up with.

Other dead ends explored:

netfilter/conntrack tools don't seem to keep retransmits
stracing netstat -s showed that it is just printing /proc/net/netstat
column 9 in /proc/net/tcp looked promising, but it unfortunately appears to be unused.

Best Answer

Related Solutions

How to diagnose packet loss

Linux – How passively monitor for tcp packet loss? (Linux)

Related Topic