BGP signalled MPLS issues

cisco-iosmplsvpls

I am having an issue getting a VPLS up using an ME3600x with 15.3(2) and BGP signalling.
@yelfathi and @packettalk have pointed me down the correct route but not sure how to resolve it.

I have 2 ME3600 switches using RSVP as PE and 2 MX80 routers as P.

The P routers show the LSP's as transit LSP correctly and my TE tunnels are up. On both switches I get the following:-

SW1.THN-LON#show mpls L2transport vc 501 detail
Local interface: VFI FLVPLS001 vfi up
Interworking type is Ethernet
Destination address: 46.226.0.10, VC ID: 501, VC status: down
Last error: MPLS dataplane reported a fault to the nexthop
Output interface: none, imposed label stack {}
Preferred path: not configured
Default path: no route
No adjacency
Create time: 00:19:42, last status change time: 00:19:42
Last label FSM state change time: 00:19:42
Signaling protocol: BGP
Status TLV support (local/remote)   : Not Applicable
  LDP route watch                   : Not Applicable
  Label/status state machine        : activating, LruRruD
  Last local dataplane   status rcvd: DOWN(pw-tx-fault)
  Last BFD dataplane     status rcvd: Not Applicable
  Last BFD peer monitor  status rcvd: Not Applicable
  Last local AC  circuit status rcvd: No fault
  Last local AC  circuit status sent: DOWN(pw-rx-fault)
  Last local PW i/f circ status rcvd: No fault
  Last local LDP TLV     status sent: Not Applicable
  Last remote LDP TLV    status rcvd: Not Applicable
  Last remote LDP ADJ    status rcvd: Not Applicable
MPLS VC labels: local 27, remote 16
Group ID: local 0, remote 0
MTU: local 1500, remote 1500
Control Word: Off
Dataplane:
SSM segment/switch IDs: 0/10075 (used), PWID: 12
VC statistics:
transit packet totals: receive 0, send 0
transit byte totals:   receive 0, send 0
transit packet drops:  receive 0, seq error 0, send 0


SW1.THN-LON#show mpls for 46.226.0.10 detail
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop  
Label      Label      or Tunnel Id     Switched      interface            
None       304720     46.226.0.10/32   0             Tu0        point2point
MAC/Encaps=14/18, MRU=9000, Label Stack{304720}, via Te0/2
A8D0E55DEB3D88F077938CDB8847 4A650000

The pseudo wires are up but still the VC is down.

When doing the following command it comms back saying no FEC mapping.

SW1.THN-LON#ping mpls ipv4 46.226.0.10/32 
Sending 5, 100-byte MPLS Echos to Target FEC Stack TLV descriptor, 
 timeout is 2 seconds, send interval is 0 msec:

Codes: '!' - success, 'Q' - request not sent, '.' - timeout,
'L' - labeled output interface, 'B' - unlabeled output interface, 
'D' - DS Map mismatch, 'F' - no FEC mapping, 'f' - FEC mismatch,
'M' - malformed request, 'm' - unsupported tlvs, 'N' - no label entry, 
'P' - no rx intf label prot, 'p' - premature termination of LSP, 
'R' - transit router, 'I' - unknown upstream index,
'l' - Label switched with FEC change, 'd' - see DDMAP for return code,
'X' - unknown return code, 'x' - return code 0

Type escape sequence to abort.
FFFFF
Success rate is 0 percent (0/5)
Total Time Elapsed 48 ms

Config for the switch re relevant parts as follows:-

interface Tunnel0
 description sw1.thn-lon-to-sw1.sco-edi
 ip unnumbered Loopback0
 tunnel mode mpls traffic-eng
 tunnel destination 46.226.0.10
 tunnel mpls traffic-eng autoroute announce
 tunnel mpls traffic-eng priority 1 1
 tunnel mpls traffic-eng path-option 1 dynamic

mpls traffic-eng tunnels
l2vpn vfi context FLVPLS001 
 vpn id 501
 autodiscovery bgp signaling bgp 
  ve id 1
  rd 56595:501
  route-target export 56595:501
  route-target import 56595:501

bridge-domain 501 
 member GigabitEthernet0/1 service-instance 2


interface GigabitEthernet0/1
 description customer xxx
 switchport trunk allowed vlan none
 switchport mode trunk
 mtu 1600
 speed 100
 duplex full
 service instance 1 ethernet
  description Cust: xxx
  encapsulation dot1q 400
  rewrite ingress tag pop 1 symmetric
 !        
 service instance 2 ethernet
  description Oil and Gas VPLS
  encapsulation dot1q 501
  rewrite ingress tag pop 1 symmetric

interface Vlan501
 no ip address
 member vfi FLVPLS001

Any help/commands etc greatly appreciated

router bgp 56595  
 bgp log-neighbor-changes  
 bgp graceful-restart restart-time 120  
 bgp graceful-restart stalepath-time 360  
 bgp graceful-restart  
 neighbor 46.226.0.10 remote-as 56595  
 neighbor 46.226.0.10 update-source Loopback0  
 !  
 address-family l2vpn vpls  
  neighbor 46.226.0.10 activate  
  neighbor 46.226.0.10 send-community extended  
  neighbor 46.226.0.10 suppress-signaling-protocol ldp  
 exit-address-family  

I have added ospf config for completeness.

router ospf 1
 redistribute connected subnets
 redistribute static subnets
 network 46.226.6.80 0.0.0.3 area 0.0.0.0
 mpls traffic-eng router-id Loopback0
 mpls traffic-eng area 0.0.0.0

some events from debug atom events

*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: Evt dataplane reactivateS, in activating
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: . Activate dataplane
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: . Need to setup the dataplane
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: . Setup dataplane, PWID 21
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: .. Provision SSM with PWID 21, VC ID 501, Block ID 3
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: .. Set imp flags: ra
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: ..              : nsf
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: .. No signal context found, defaulting to single segment PW
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: .. Provision SSM with 7540/20332 (sw/seg)
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: Receive SSM dataplane unavailable notification
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: Evt dataplane downS, in activating
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: . Dataplane unavailable
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: . Set last error: MPLS dataplane reported a fault to the nexthop
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: . Notify dataplane down
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: Deactivating data plane
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: Notify dataplane down
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: Unprovision and deallocate SSM segment
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: Added vc to 60 sec retry queue
*Jun  8 05:31:57.748: AToM[46.226.0.12, 501]: Event provision retry already in retry queue

Output of bgp

SW1.THN-LON#show bgp l2vpn vpls all
BGP table version is 17, local router ID is 46.226.0.12
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
          r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
          x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

 Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 56595:501
 *>  56595:501:VEID-1:Blk-1/136
                   0.0.0.0                            32768 ?
 *>i 56595:501:VEID-2:Blk-1/136
                   46.226.0.10              0    100      0 ?

after some help from @ytti we have narrowed it to a bug within the me3600 regarding BGP signalling.

We changed signalling to be LDP and it worked fine. Will post an update once resolved with Cisco.

Best Answer

Revising answer/consolidating comments and taking another stab (disclaimer: my MPLS/VPLS experience is not on the Cisco platform). Trying to infer how your topology looks:

ME3600<--->MX80<--->MX80<--->ME3600

You said you were running OSPF between all of these. Have you double checked that you have a full mesh of LSP's between your devices? (this is a requirement for vanilla VPLS). Have you configured mpls traffic-eng area x and mpls traffic-eng router-id Loopback0 within router ospf on the ME3600's?

I'm assuming that Tunnel0 is actually representative of an LSP (I know the term "Pseudowire" has been thrown around in this question, but to me Pseudowire = L2VPN/EoMPLS/VLL). LSP's are only unidirectional, so it's possible that because of a lack of a full mesh of LSP's, the ME3600's can see the other's loopbacks in the routing table, the P routers may be interfering with the RSVP signaling/dropping things on the floor.

Related Topic