Reasons to Avoid Using BFD – Network Design Considerations

ciscodesignethernetjunipermpls

In looking to implement Bidirectional Forwarding Detection (BFD) it seems to be very flexible in terms of timer tuning, light weight regarding any overhead and it's flexibility in terms of overall application appears very impressive.

So if for example it can be applied to detect link failure over Ethernet, MPLS over multiple hops, at the network edge, for IGP convergence, for tunnels etc etc – why would it not be used in certain scenarios perhaps and are there other emerging alternatives to be aware of?

Best Answer

I am only directly aware of one issue with BFD, which is CPU demand. I am currently investigating an issues with a Cisco 7301 which when pushing more traffic during our peak hours, compared to the rest of the day, BFD is sometimes timing out and routing trips over to the next link.

It seems that under high traffic volumes the router CPU usage is rising (which isn't unusual) but at about 40-50% CPU BFD packets aren't receiving enough resources.

However I have found the following information which suggests additional issues with BFD (From this NANOG presentation, there is more in the presentation, it's a good one, give it a read!)

What are the caveats?

  • Two main ones:
    1. BFD can have high resource demands depending on your scale.
    2. BFD is not visible to Layer 2 bundling protocols. (Ethernet LAGs or POS bundles)

BFD Resource Demands

  • The number of BFD sessions on each linecard or router can impact how well BFD scales for you. -Each unique platform has its own limits.
  • Bundled interfaces supporting min tx/rx of 250ms or 2 seconds have been seen.
  • In some cases, BFD instances on a router may need to be operated on the route-processor depending on the implementation (non-adjacency based BFD sessions).
  • Test your platform first before deploying BFD. Attempt to put load on the RP or LC CPU with your configured settings. This can be done by:
  • Executing CPU-heavy commands
  • Flooding packets to TTL expire on the destination

BFD Resource Demands (cont’d)

  • What values are safe to try?
  • Based upon speaking to several operators, 300ms with a multiplier of 3 (900ms detection) appears to be a safe value that works on most equipment fairly well.
  • This is a significant improvement over some of the alternatives.

BFD and L2 link-bundling

  • BFD is unaware of underlying L2 link bundle members.
  • A 4x10GigE L2 bundle (802.3ad) would appear as a single L3 adjacency. BFD packets would be transmitted on a single member link, rather than out all 4 links.
  • A failure of the link with BFD on it would result in the entire L3 adjacency failing.
  • However, in some scenarios the failed member link may result in only a single BFD packet being dropped. Subsequent packets may route over working member links.