The difference between aggregate labels and normal labels is such that normal labels directly point to L2 rewrite details (an interface and L2 address). This means a normal label will be label switched by the egress PE node directly out, without doing an IP lookup.
Adversely, aggregate labels can potentially represent many different egress options, so L2 rewrite information is not associated with the label itself. This means that an egress PE node must perform an IP lookup for the packet, to determine appropriate L2 rewrite information.
Typical reasons why you might have an aggregate label instead of normal label are:
- Need to perform neighbor discovery (IPv4 ARP, IPv6 ND)
- Need to perform ACL lookup (egress ACL in customer interface)
- Running whole VRF under single label (table-label)
Some of these restrictions (particularly 2) are not valid to all platforms.
How traceroute is affected in MPLS VPN environment is by the transit P, when generating the TTL exceeded message, will not know how to return it (it does not have routing table entry to the sender). So a transit P node will send the TTL exceeded message with original label stack all the way to the egress PE node, in hope that the egress PE note has an idea of how to return the TTL exceeded message to the sender.
This feature is automatically on in Cisco IOS but needs 'icmp-tunneling' configured in Juniper JunOS.
Based on this, I would suspect that perhaps your CE devices are not accepting packets when source address is a P node link network, and as they are not accepting the ICMP message, they are not able to return it to the sender.
A Possible way test to this theory would be to enable per-vrf label:
- IOS: mpls label mode all-vrfs protocol bgp-vpnv4 per-vrf
- JunOS: set routing-instances FOO vrf-table-label
Generally speaking I do not recommend propagating TTL, especially on VPN environment, at least in our case customers get confused and anxious about it. They worry why their VPN has foreign addresses showing.
Another thing which confuses people causing them to open a support ticket, is when they are running a traceroute from say the UK to the US, because they see >100ms latency between two core routers in UK, not realizing that the whole path has same latency all the way to the west coast of the US, because all the packets take a detour from there.
This issue is mostly unfixable by design, however in IOS you can determine how many labels at most to pop (mpls ip ttl-expiration pop N) when you are generating TTL exceeded. This gives you a somewhat decent approximation if INET == 1 label, VPN == >1 label, so you can configure it so that VPN traffic is tunnelled and INET traffic gets directly returned without egress PE node detour. But as I said, this is just an approximation of desired functionality, as features like in-transit repairs may cause your label stack not to be always same size for the same service.
Label itself is either an aggregate label, which means label does not have rewrite information attached to it, so it does not know egress interface nor egress MAC address. Aggregate labels are used for example to connected networks.
Aggregate label implies that you do not know egress information after MPLS lookup, so you must do normal IP lookup to determine egress information.
Normal label is attached with egress rewrite information, that is lookup against label will return egress interface (with all necessary information, like MAC address, VLAN etc)
Lets assume all links are IGP metric 1, except B-C is metric 2.
For A to send to E's loopback (192.0.2.5) following will happen
- E will allocate either explicit (0) or implicit (default) for 192.0.2.5/32
- E will distribute the prefix+label (FEC) to C and D, using LDP
- C will allocate local label for this, say 100 (could be anything)
- C will program FIB entry, so that label 100 points to interface towards E, and MPLS label operations 'SWAP 0' if explicit null, or 'POP' if implicit null
- C will program FIB entry, so that prefix 192.0.2.5/32 points to interface towards E, and MPLS label operation 'PUSH 0' if explicit null
- D will allocate local label for this, say 200 (could be anything, even 100, 300, 400)
- D will program FIB entry, so that label 200 points to interface towards E, and MPLS label operation 'SWAP 0' if explicit null, or 'POP' if implicit null
- D will program FIB entry, so that prefix 192.0.2.5/32 points to interface towards E, and MPLS label operation 'PUSH 0' if explicit null
- D and C will distribute the prefix+label to B, using LDP
- B will allocate local label for this, say 300 (could be anything)
- B will program FIB entry, so that label 300 points to interface towards D (because of IGP metric!), and MPLS label operation 'SWAP 200'
- B will program FIB entry, so that prefix 192.0.2.5/32 points to interface towards D, and MPLS label operation 'PUSH 200'
- B will distribute the prefix+label to A, using LDP
- A will allocate local label for this, say 400 (could be anything)
- A will program FIB entry, so that label 400 points to interface towards B, and MPLS label operation 'SWAP 300'
- A will program FIB entry, so that prefix 192.0.2.5/32 points to interface towards B, and MPLS label operation 'PUSH 300'
Now what happens in forwarding plane when A sends to 192.0.2.5/32
- A will PUSH (impose) label 300 and send towards B
- B will consult FIB for 300, which is Interface D and SWAP 200
- D will consult FIB for 200, which is Interface E and POP (or SWAP 0)
- E will receive frame
Best Answer
Yes both LSR[12] could advertise given FEC, say 10.0.0.1/32 with label 10 to each other.
Then if IGP says to LSR1 10.0.0.1/32 egress interface is towards LSR2, it'll impose (or swap to) label 10 and send towards LSR2. LSR2 then will find egress interface being something else than towards LSR1 and swap label to what ever that direction has advertised, might be still label 10, or might be something else, does not really matter at all.
Labels are completely local today and some RFCs dictate that is how it should be. Personally I'd like IGP labels to be global for simplicity. Because MPLS LSR does not know how labels look from anyone else's POV, we need hacks like tLDP (targeted LDP) when implementing rLFA (remote loop free alternative). We need the tLDP to learn bindings of remote node.
With regards label scope, label space today is chassis-wide in every device I've ever seen, but standards fully allow per-interface label spaces.