The VPC peer link has a few special roles - it carries the CFSoE traffic (mentioned below) to allow the peers to sync (enabled automatically when feature vpc
is turned on) and carries traffic for orphan ports. It doesn't behave completely like a normal port channel and, as a result, it's best practice to keep non-VPC VLAN's on a different physical link. One significant difference is, as mentioned, that broadcast traffic is sent over the peer link but normally would not be sent to any non-orphan member link. Orphans still need to receive broadcasts. This design best practice guide is for the 7K but more-or-less equally applies to 5K, 9K or 3K.
CFS is basically the mechanism by which state is synchronized between VPC members. It's the way in which parameters are exchanged to assure configuration consistency, the place where STP configuration is normalized, where multicast/IGMP information is programmed and, if configured, is how IP ARP and/or IPv6 ND information is preemptively synchronized. What is unequivocally not the case is that CFS is tracking on a per-packet basis which traffic should- and should not- traverse the peer link. Its job is basically to maintain synchronization of basic configuration state, not real-time traffic control.
To your specific question -
The assumption in a VPC environment is that under normal operating conditions all devices have member links to both VPC peers. As such, broadcasts received on NXOS1 in your diagram would be sent to the other member links on that same switch (...which is to say NXOS2). It is also sent over the peer link to NXOS2. NXOS2 will not send a broadcast received on the peer link to any of its member ports (as, again, it's assumed NXOS1 has already done this). If it has any orphan ports, however, it will replicate to them. As an example - imagine the link from SW1 to NXOS1 had failed. SW2 sends a broadcast that happens to hash to NXOS1. NXOS1 forwards across its peer link to NXOS2 which sees that SW1 is operating as an orphan (learned via CFS) and will send the broadcast accordingly.
To your second question - if the VPC peer link fails but the keep-alive is still up then the VPC peer that's operating as a secondary will shut down its VPC member ports (including L3 SVI's), thus forcing the downstream switches to only use links to the VPC primary. There are knobs to mediate this - to, say, allow orphans or some SVI's to stay up but this should be approached with extreme caution as you could easily end up in a situation where the secondary switch is attracting L3 traffic from upstream that it can't actually reach at L2.
EDIT (to answer question in comments)
The question of what goes wrong in a split-brain situation (i.e. both P/L and keep-alive failing) depends on the surrounding topology. In the topology you've drawn where there are literally only L2 devices and the switches are both dual-connected to both VPC peer devices and there aren't any other links? There isn't a ton of risk of loops if the downstream switches continue to see the member links as correctly aggregated, as split-horizon would be correctly maintained for broadcasts. There could be some protocol issues depending on the VPC configuration as the downstream switches would see STP PDU's with different sources coming from different members of a port-channel, but that behavior depends on how VPC was configured on the two switches.
Dual/independent (non-synchronized) IGMP would break some things and could potentially see duplicate multicast packets flooded as both VPC masters would perform the same function independently (...this is one of those important things that CFS handles over the peer link).
Now - if the topology is more involved? It can get much uglier. Add in L3 on that VPC pair and perhaps some orphan links and all of a sudden you're going to be dropping traffic as two routers advertise reachability to subnets they can only partially see. Put a link between S1 and S2 and you could end up with an unstable network as they keep seeing TCN's as the two VPC pairs keep sending BPDU's with conflicting information. Or imagine a back-to-back VPC where the upstream VPC pair is now hashing traffic (including broadcast/multicast) across the two VPC switches which, in turn, end up flooding the same packet across various downstream member links. Plenty of opportunity for duplicated broadcasts, improper flooding, etc.
Finally - if any of these issues causes LACP to break - or if seeing differing LACP bundle information showing up on different channels breaks the downstream switch then you've got a situation where the downstream switches are now potentially forwarding between the VPC switches. That's definitely ripe for a L2 loop.
These types of scenarios are why the best practice documents so stringently emphasize the necessity of physically redundant peer links (two links on a 5K, spread across multiple modules on a 9500 or 7K) and why running the keepalive traffic on a completely independent set of links (and infrastructure) is so crucial. As in any active/active clustering system a split-brain is always dangerous.
You need to make sure whether those devices
- really don't send any xSTP BPDUs on their own (verify by packet capturing) - you cannot trust the documentation at all times
- possibly pass through xSTP BPDUs - 802.1D-compliant bridges need to drop these BPDUs but there are many switches around (especially small ones) that do forward BPDUs.
If they really do not participate in xSTP but pass through BPDUs, your tree should reconverge very quickly, so I have some doubts.
If they don't send BPDUs and don't forward them either, your STP scheme is broken and cannot work. However, this would cause the loop to go undetected and cause a broadcast storm, so pretty obviously it's not the case.
I suspect there are different STP schemes at work.
- Classic (and obsolete) STP interoperates with RSTP and MSTP fairly well but slows them down (so that seems likely).
- RSTP and MSTP interoperate perfectly.
- (R)PVST flavors interoperate badly with anything IEEE in most cases, so that should be avoided.
- Something completely different (e.g. probing frames) that isn't documented and that you haven't discovered yet.
Additionally, RSTP is intended for a tree topology with a maximum depth of seven bridges. Assuming the devices participate in STP, the chain length exceeds that maximum and causes convergence problems.
You'd need to reduce the Hello Time (e.g. 1 s) to increase the propagation speed - depending on what those devices actually do, it might help or not.
Alternatively, you could increase the Maximum Age but that causes slower convergence.
Best Answer
There is no theoretical limit, but memory/CPU may impose practical limits on particular hardware. Depending on the topology, increasing the number of devices increases the convergence time. When that becomes a problem depends on your particular situation.
In my experience, loops tend to be small. I don't recall ever seeing a looped configuration larger than 3 or 4 devices.