Bgp – can some one explain BGP router graceful restart flow

bgp

Can some one please explain the BGP router graceful restart flow , how the messages will exchange before restart and after session re-establish between BGP router and it's peer.

Best Answer

Consider the following topology R1---R2---R3.

The lines R1---R2 and R2---R3 represent both physical connections and EBGP sessions.

We are going to be looking at BGP UPDATEs that flow from R1 to R3, so data plane traffic that flows from R3 to R1.

In this example R2 will be restarting.

R1 ----> R2

OPEN
  Graceful restart capability => "I support graceful restart"
    Restart flags
      R = 0 => "I have not restarted"
      Restart Time = 30 => "If I restart in the future, I expect to be done in 30 seconds"
    AFI SAFI = IPv4 Unicast => "I support GR for IPv4-Unicast"
      AFI SAFI Flags
        F = 0 => "This is only relevant after a restart, so 0 for now"

R2 ----> R1

OPEN (similar to above)

R2 ----> R3

OPEN (similar to above)

R3 ----> R2

OPEN (similar to above)

Router R1 originates some prefixes 1.1.0.0/16 and 2.2.0.0/16

R1 ----> R2

UPDATE
  AFI-SAFI = IPv4-Unicast
  Prefix = 1.1.0.0/16, 2.2.0.0/16
  Attributes
    Next Hop = R1
    AS Path = 100
    etc.

Router R2 installs entries in the R2 FIB to forward traffic to 1.1.0.0/16 and 2.2.0.0/16 through router R1.

Router R2 propagates the BGP UPDATEs received from R1 to R3:

R2 ----> R3

UPDATE
  AFI-SAFI = IPv4-Unicast
  Prefix = 1.1.0.0/16, 2.2.0.0/16
  Attributes
    Next Hop = R2
    AS Path = 200 100
    etc.

Router R3 installs entries in the R3 FIB to forward traffic to 1.1.0.0/16 and 2.2.0.0/16 through router R2.

*** The control plane on router R2 goes down for some reason (crash, upgrade) ***

The forwarding plane on router R2:

  1. Keeps forwarding packets using the routes that are currently installed in the FIB: traffic to 1.1.0.0/16 and 2.2.0.0/16 continues to be forwarded to router R1.
  2. Marks the routes in the FIB as stale (the spec says that the routes in the RIB get marked as stale, but this is difficult to implement since the control plane was just blown away).

Router R3 notices that the BGP session to R2 goes down (e.g. because BFD timeout or BGP KEEPALIVE timeout or link down).

Normally, router R3 would remove the BGP routes received from R2 (namely 1.1.0.0/16 and 2.2.0.0/16) from its RIB and from its FIB.

However, because router R2 advertised that it supports graceful restart, router R3 will

  1. Keep the routes from R2 in its RIB
  2. Mark all routes in the RIB as "stale"
  3. Keep the routes from R2 in its FIB
  4. Continue forwarding traffic for those routes in the FIB

*** Topology change ***

While router R2 is down, something changes in the topology of the network.

Let's say something changes that causes router R1 to stop advertising prefix 1.1.0.0/16

Router R1 will withdraw route 1.1.0.0/16 from its RIB and FIB, and it sends a BGP WITHDRAW to all of its neighbors.

However, R1 cannot send a WITHDRAW message to R2 because the R1-R2 BGP session is down at this point.

We will recover from this "getting out of sync" problem later on.

*** The control plane on router R2 comes back up ***

We assume it took less than 30 seconds (the value of the Restart Time field in the OPEN messages that router R2 sent at the beginning) for the control plane to come back up. If it took longer, router R3 would have "given" up and flushed the routes learned from R2 from its RIB and FIB.

The BGP sessions come back up:

R1 ----> R2

OPEN
  Graceful restart capability => "I support graceful restart"
    Restart flags
      R = 0 => "I have not restarted"
      Restart Time = 30 => "If I restart in the future, I expect to be done in 30 seconds"
    AFI SAFI = IPv4 Unicast => "I support GR for IPv4-Unicast"
      AFI SAFI Flags
        F = 0 => "This is only relevant after a restart, so 0 for now"

R2 ----> R1

OPEN
  Graceful restart capability => "I support graceful restart"
    Restart flags
      R = 1 => "*** I DID RESTART ***"
      Restart Time = 30 => "If I restart in the future, I expect to be done in 30 seconds"
    AFI SAFI = IPv4 Unicast => "I support GR for IPv4-Unicast"
      AFI SAFI Flags
        F = 1 => "*** I DID PRESERVE FORWARDING STATE IN THE FIB ***"

R3 ----> R2

Similar to R1->R2 OPEN

R2 ----> R3

Similar to R2->R1 OPEN

*** Resynchronization ***

Router R2 knows it restarted, so it is NOT going to send any UPDATEs until it has received all UPDATEs from its neighbors (R1 and R3), selected the best routes, and updated its RIB and FIB.

Router R1 knows that its neighbor R2 restarted, so it is going to re-send all routes to R2 followed by an end-of-rib marker, and then flush any stale routes received from R2 from its RIB/FIB (there are not any in this example).

Similarly, router R3 knows that its neighbor R2 restarted, so it is going to re-send all routes to R2 (there are not any in this example), followed by an end-of-rib marker, and then flush any stale routes received from R2 from its RIB/FIB.

So, let's walk through this in detail:

Router R1 originates prefixes 2.2.0.0/16 (but not 1.1.0.0/16 anymore due to the topology change mentioned above):

R1 ----> R2

UPDATE
  AFI-SAFI = IPv4-Unicast
  Prefix = 2.2.0.0/16
  Attributes
    Next Hop = R1
    AS Path = 100
    etc.

Router R2 installs 2.2.0.0/16 in its RIB and in its FIB.

Router R2 already had an entry for 2.2.0.0/16 in its FIB which was marked stale. This stale marking is now removed; it is fresh again.

Router R2 does not receive an 1.1.0.0/16 UPDATE from R1. Hence, R2 does not have an entry for 1.1.0.0/16 in its RIB. But R2 does still have an entry for 1.1.0.0/16 in its FIB which is and remains marked stale.

Router R1 has finished sending all routes to R2, so it sends an end-of-rib marker to R2:

R1 ----> R2

UPDATE
  AFI-SAFI = IPv4-Unicast
  End-of-RIB marker

At this point, router R2 has received an end-of-rib marker from R1, but not yet from R3. So, it does not yet take any action (it needs to have received an end-of-rib marker from all neighbors).

Now, let's look at router R3.

In this example router R3 does not have any prefixes to send to R2, so it immediately sends an End-of-RIB marker:

R3 ----> R2

UPDATE
  AFI-SAFI = IPv4-Unicast
  End-of-RIB marker

At this point, router R2 has received End-of-RIB markers from all of its neighbors (R1 and R3), so it will take the following actions:

  1. R2 will run the best route selection process for every destination prefix in its RIB (in this example only 2.2.0.0/16)
  2. R2 will install the selected best route for every prefix in the RIB into the FIB (only 2.2.0.0/16)
  3. R2 will flush any remaining stale routes from the FIB (in this case 1.1.0.0/16)
  4. R2 will start sending UPDATEs to advertise (propagate) the routes in its RIB to the neighbors:

Router R2 propagates the BGP UPDATEs received from R1 to R3:

R2 ----> R3

UPDATE
  AFI-SAFI = IPv4-Unicast
  Prefix = 2.2.0.0/16
  Attributes
    Next Hop = R2
    AS Path = 200 100
    etc.

At this point router R2 has finished sending all routes to R3, so it sends an End-of-RIB marker to R3:

R2 ----> R3

UPDATE
  AFI-SAFI = IPv4-Unicast
  End-of-RIB marker

Note that router R2 does not have routes to send to R1 (specifically it does not send the route for 2.2.0.0/16 back to R1 because of the AS-path loop). So, R2 immediately sends an End-of-RIB marker to R1 as well:

R2 ----> R1

UPDATE
  AFI-SAFI = IPv4-Unicast
  End-of-RIB marker

When router R3 receives the end-of-rib marker from R2, it flushes all stale routes from R1 (in this case 1.1.0.0/16) from both is RIB and FIB.

Router R1 does the same when it receives the end-of-rib marker from R2, but it this example there is nothing to flush since R2 did not advertise any routes to R1.