which peer will send the open message first?
Normally, the speaker that opens the socket sends the first OPEN message. But it actually doesn't matter (ref the DelayOpen timer), because BGP also provides a way to delay the OPEN message so the opposite peer can send first:
Option 1: DelayOpen
Description: The DelayOpen optional session attribute allows
implementations to be configured to delay sending
an OPEN message for a specific time period
(DelayOpenTime). The delay allows the remote BGP
Peer time to send the first OPEN message.
Value: TRUE or FALSE
In the event that both speakers open duplicate TCP sessions and send OPEN messages on each socket simultaneously, the BGP Identifier is used to resolve which socket should be closed. See RFC 4271, Section 6.8:
6.8. BGP Connection Collision Detection
If a pair of BGP speakers try to establish a BGP connection with each other
simultaneously, then two parallel connections well be formed. If the source IP address
used by one of these connections is the same as the destination IP address used by the
other, and the destination IP address used by the first connection is the same as the
source IP address used by the other, connection collision has occurred. In the event
of connection collision, one of the connections MUST be closed.
Based on the value of the BGP Identifier, a convention is established for detecting
which BGP connection is to be preserved when a collision occurs. The convention is to
compare the BGP Identifiers of the peers involved in the collision and to retain only
the connection initiated by the BGP speaker with the higher-valued BGP Identifier.
Is there any good BGP Peer fsm diagram?
Wikipedia has this simplified BGP FSM.
Best Answer
Consider the following topology
R1---R2---R3
.The lines R1---R2 and R2---R3 represent both physical connections and EBGP sessions.
We are going to be looking at BGP UPDATEs that flow from R1 to R3, so data plane traffic that flows from R3 to R1.
In this example R2 will be restarting.
Router R1 originates some prefixes
1.1.0.0/16
and2.2.0.0/16
Router R2 installs entries in the R2 FIB to forward traffic to
1.1.0.0/16
and2.2.0.0/16
through router R1.Router R2 propagates the BGP UPDATEs received from R1 to R3:
Router R3 installs entries in the R3 FIB to forward traffic to
1.1.0.0/16
and2.2.0.0/16
through router R2.*** The control plane on router R2 goes down for some reason (crash, upgrade) ***
The forwarding plane on router R2:
1.1.0.0/16
and2.2.0.0/16
continues to be forwarded to router R1.Router R3 notices that the BGP session to R2 goes down (e.g. because BFD timeout or BGP KEEPALIVE timeout or link down).
Normally, router R3 would remove the BGP routes received from R2 (namely
1.1.0.0/16
and2.2.0.0/16
) from its RIB and from its FIB.However, because router R2 advertised that it supports graceful restart, router R3 will
*** Topology change ***
While router R2 is down, something changes in the topology of the network.
Let's say something changes that causes router R1 to stop advertising prefix 1.1.0.0/16
Router R1 will withdraw route 1.1.0.0/16 from its RIB and FIB, and it sends a BGP WITHDRAW to all of its neighbors.
However, R1 cannot send a WITHDRAW message to R2 because the R1-R2 BGP session is down at this point.
We will recover from this "getting out of sync" problem later on.
*** The control plane on router R2 comes back up ***
We assume it took less than 30 seconds (the value of the Restart Time field in the OPEN messages that router R2 sent at the beginning) for the control plane to come back up. If it took longer, router R3 would have "given" up and flushed the routes learned from R2 from its RIB and FIB.
The BGP sessions come back up:
*** Resynchronization ***
Router R2 knows it restarted, so it is NOT going to send any UPDATEs until it has received all UPDATEs from its neighbors (R1 and R3), selected the best routes, and updated its RIB and FIB.
Router R1 knows that its neighbor R2 restarted, so it is going to re-send all routes to R2 followed by an end-of-rib marker, and then flush any stale routes received from R2 from its RIB/FIB (there are not any in this example).
Similarly, router R3 knows that its neighbor R2 restarted, so it is going to re-send all routes to R2 (there are not any in this example), followed by an end-of-rib marker, and then flush any stale routes received from R2 from its RIB/FIB.
So, let's walk through this in detail:
Router R1 originates prefixes 2.2.0.0/16 (but not 1.1.0.0/16 anymore due to the topology change mentioned above):
Router R2 installs 2.2.0.0/16 in its RIB and in its FIB.
Router R2 already had an entry for 2.2.0.0/16 in its FIB which was marked stale. This stale marking is now removed; it is fresh again.
Router R2 does not receive an 1.1.0.0/16 UPDATE from R1. Hence, R2 does not have an entry for 1.1.0.0/16 in its RIB. But R2 does still have an entry for 1.1.0.0/16 in its FIB which is and remains marked stale.
Router R1 has finished sending all routes to R2, so it sends an end-of-rib marker to R2:
At this point, router R2 has received an end-of-rib marker from R1, but not yet from R3. So, it does not yet take any action (it needs to have received an end-of-rib marker from all neighbors).
Now, let's look at router R3.
In this example router R3 does not have any prefixes to send to R2, so it immediately sends an End-of-RIB marker:
At this point, router R2 has received End-of-RIB markers from all of its neighbors (R1 and R3), so it will take the following actions:
Router R2 propagates the BGP UPDATEs received from R1 to R3:
At this point router R2 has finished sending all routes to R3, so it sends an End-of-RIB marker to R3:
Note that router R2 does not have routes to send to R1 (specifically it does not send the route for 2.2.0.0/16 back to R1 because of the AS-path loop). So, R2 immediately sends an End-of-RIB marker to R1 as well:
When router R3 receives the end-of-rib marker from R2, it flushes all stale routes from R1 (in this case 1.1.0.0/16) from both is RIB and FIB.
Router R1 does the same when it receives the end-of-rib marker from R2, but it this example there is nothing to flush since R2 did not advertise any routes to R1.