Cisco – Lowest Possible Failover Latency

ciscofailoverlayer2layer3

Let’s say I have two networking devices (either L2 switches, L3 switches, or routers), and the two devices have two ethernet connections between them: port 1 to port 1, and port 2 to port 2. If one of the connections is broken, I want the traffic to fail over with the lowest possible latency (less than 1ms) to the other connection. What method of failover would result in the lowest possible latency failover? A layer 2 protocol like RSTP, a layer 3 protocol like OSPF with hello timers tuned waaaay down, or something else?

Best Answer

Failover time is going to be dictated by two chief factors:

1.) Time to detection - How quickly the connected systems can determine a link has failed.

2.) Time to convergence - Once the failure is detected how quickly traffic can be redirected to a working path.

Honestly the first item (failure detection) is harder to get right and ends up adding the most time to the process. Loss of physical link is clearly an easy indicator but it's important to remember that this may only be seen in one direction (ex: one strand of fiber fails, but not the other) and that any kind of intervening repeater may actually mask the behavior.

To address this the best mechanisms tend to use some kind of lightweight echo mechanism - constant OAM traffic of some sort moving back and forth between connected peers to validate that data is actually passing. Synchronous communication links tend to incorporate this into the basic protocol pretty effectively (SONET being a great example) but we've had to graft technologies on to accommodate higher-level protocols - either in the form of protocol hellos/keep-alives (as found in routing protocols) or, more recently, with lightweight protocols like bidirectional forwarding detection (BFD), which was mentioned in a recent question. The idea with bfd is the use of a common keepalive mechanism (again - very lightweight) operating at a high frequency (usually in the low hundreds of milliseconds), often with hardware assist.

The second part (reconvergence) has a lot of other issues associated but its difficulty tends to be directly proportionate to the width of the network. For example - reconverging connectivity between two switches or routers with a pair of redundant links is trivial. Finding an alternate path on a complex international network with thousands of network devices? A whole other ballgame. This, incidentally, is where SONET's gold-standard of 50ms comes from - as APS calls for each node to have an alternate path already hot and ready to receive failed traffic.

So - to answer the question... The best possible case is one where a link fails quickly and completely (i.e. someone snips a cable). This delivers immediate results to both connected devices and, in practice, you're not going to see a whole lot of difference between removing one of a pair of equal-cost routes from an L3 forwarding table versus updating the hash tables in an L2 port channel.

That said - if you're running an L2 port channel without a protocol to detect a link failure and one link happens to go unidirectional then you might well hit a situation where some portion of traffic is silently dropped on an indefinite basis (i.e. no recovery). If you're relying on LACP or UDLD to pick up this condition then it may take ones- or tens- of seconds to detect (depending on how the protocols are configured). A stock configuration of OSPF is going to take 40 seconds (4 consecutive losses of a 10 second keepalive) to mark a link as failed. A vanilla BGP connection on some implementations could easily be 3+ minutes. If you add BFD to any of these protocols (LACP / OSPF / BGP) then detection time could be as quick as ~150-200 milliseconds but in actual practice is probably more like 300ms in the real world.

So is 1ms consistently possible under all conditions on common hardware? Probably not, unless you've got hardware capable of reliably sending, receiving and processing OA&M traffic at double-digit nanosecond precision (and there is a whole rat's nest of issues keeping such mechanisms stable). The real question tends to be figuring out the convergence speed that makes the most sense given the protocols running over the link. For standard Ethernet and IP getting in the < 250-300 ms range (from actual failure to full recovery) for any circumstances (with low double digits under common circumstances) has proven more than sufficient.

Related Solutions

Cisco – Switch Failover

What models are those Cisco switches?

If you can stack them, then yes - You can connect them in stack and aggregate ports from both of them to the servers, and connections will work.

If you can't stack them, but you can connect them together, you can still connect the servers to both of them, but you won't be able (propably) aggregate ports from both of them, as this requires features like VSS and/or mLACP. Still - rather higher-end gear.

At the worst possible case, you connect server to two different switches and pray Spanning Tree will work correctly, closing one of the connections for forwarding the traffic. Otherwise, you may end up with loop in the network, or periodic loop, and they're hard to troubleshoot or nail down in non-trivial topologies given limited instrumentation on the host with two NICs (this assumes NIC driver can actually participate in the SPT, some of them can't and you'll end up with two separate links to both switches - which will work also, but isn't what you've asked for).

Routing – Quagga OSPF DR/BDR Mismatch: Losing Routes

To answer your question directly, this is expected behavior by OSPF and not a bug in Quagga.

So first, let's take a look at the DR/BDR section of the RFC.

Designated Router
    The Designated Router selected for the attached network.  The
    Designated Router is selected on all broadcast and NBMA networks
    by the Hello Protocol.  Two pieces of identification are kept
    for the Designated Router: its Router ID and its IP interface
    address on the network.  The Designated Router advertises link
    state for the network; this network-LSA is labelled with the
    Designated Router's IP address.  The Designated Router is
    initialized to 0.0.0.0, which indicates the lack of a Designated
    Router.

Backup Designated Router
    The Backup Designated Router is also selected on all broadcast
    and NBMA networks by the Hello Protocol.  All routers on the
    attached network become adjacent to both the Designated Router
    and the Backup Designated Router.  The Backup Designated Router
    becomes Designated Router when the current Designated Router
    fails.  The Backup Designated Router is initialized to 0.0.0.0,
    indicating the lack of a Backup Designated Router.

If the BDR field in the Hello packet header is set to 0.0.0.0, it means you do not have a BDR elected.

In your case this is because you have your other router set to a priority of 0, this makes the router ineligible to become a BDR (this is why you see "DROther" and not "BDR"). You just need to set the priority on your other router to something that isn't 0.

Here is the other piece from the RFC for some more context.

Router Priority
    An 8-bit unsigned integer.  When two routers attached to a
    network both attempt to become Designated Router, the one with
    the highest Router Priority takes precedence.  A router whose
    Router Priority is set to 0 is ineligible to become Designated
    Router on the attached network.  Advertised in Hello packets
    sent out this interface.

https://www.ietf.org/rfc/rfc2328.txt

Best Answer

Related Solutions

Cisco – Switch Failover

Routing – Quagga OSPF DR/BDR Mismatch: Losing Routes

Related Topic