Why Equal Cost LSPs Are Not Load Balancing with Auto-Bandwidth


First, I'm adding this question and answering myself because this type of behavior was absolutely no where to be found, hopefully it will help someone.


We use auto bandwidth to handle the bandwidth subscriptions for our LSPs. The LSPs are equal cost and appear in our forwarding/routing tables appropriately as available next hops for each destination.

However for a single destination, the 4 equal cost LSPs are not load balancing equally (or even close to equally). We understand that JUNOS uses a per-flow load balancing algorithm despite the statement "per-packet" in the policy to enable load-balancing. But that does not explain the major difference between each subscription for the LSP (this subscription imbalance happens multiple times per day, it is not a one off occurrence), like so:

jhead@R1> show route protocol rsvp detail (2 entries, 1 announced)
        State: <FlashAll>
        *RSVP   Preference: 7/1
                Next hop: via xe-0/0/0.0 weight 0x1 balance 35%, selected
                Label-switched-path LSP1
                Next hop: via xe-1/0/0.0 weight 0x1 balance 35%
                Label-switched-path LSP2
                Next hop: via xe-0/0/1.0 weight 0x1 balance 26%
                Label-switched-path LSP3
                Next hop: via xe-0/0/0.0 weight 0x1 balance 5%
                Label-switched-path LSP4

R1-R4 are MX480's and CORE-R1-R4 are MX960's.

Below are graphs comparing RSVP subscription and utilization of the LSP. Red is subscription, green is utilization. You can see that the utilization follows the reservation almost exactly throughout the day. We should see subscriptions be very close to each other between the LSPs toward the same destination.

enter image description here
enter image description here
enter image description here
enter image description here


R1 – R4 are ingress routers for all of the LSP's, they have either 2 or 4 LSP's toward each core router.

enter image description here


The LSP configuration is a single destination from R1, just as an example. All LSP's are configured exactly the same way (again, with either 2 or 4).

[edit protocols mpls]
    statistics {
        file mpls-stats;
        interval 300;
    traffic-engineering bgp;
    label-switched-path LSP1 {
        optimize-timer 300;
        auto-bandwidth {
            adjust-interval 7200;
            adjust-threshold 10;
            minimum-bandwidth 100m;
            maximum-bandwidth 4g;
            adjust-threshold-overflow-limit 2;
            adjust-threshold-underflow-limit 4;
        primary primary-loose;
    label-switched-path LSP2 {
        optimize-timer 300;
        auto-bandwidth {
            adjust-interval 7200;
            adjust-threshold 10;
            minimum-bandwidth 100m;
            maximum-bandwidth 4g;
            adjust-threshold-overflow-limit 2;
            adjust-threshold-underflow-limit 4;
        primary primary-loose;
    label-switched-path LSP3 {
        optimize-timer 300;
        auto-bandwidth {
            adjust-interval 7200;
            adjust-threshold 10;
            minimum-bandwidth 100m;
            maximum-bandwidth 4g;
            adjust-threshold-overflow-limit 2;
            adjust-threshold-underflow-limit 4;
        primary primary-loose;
    label-switched-path LSP4 {
        optimize-timer 300;
        auto-bandwidth {
            adjust-interval 7200;
            adjust-threshold 10;
            minimum-bandwidth 100m;
            maximum-bandwidth 4g;
            adjust-threshold-overflow-limit 2;
            adjust-threshold-underflow-limit 4;
        primary primary-loose;

[edit protocols rsvp]
load-balance bandwidth
interface xe-0/0/0.0 {
    bandwidth 9g;
interface xe-0/0/1.0 {
    bandwidth 9g;
interface xe-1/0/0.0 {
    bandwidth 9g;

[edit routing-options forwarding-table]
export load-balance;

Best Answer

The problem is the:

[edit protocols rsvp]
load-balance bandwidth

If you look at the Juniper documentation for Unequal Cost Load Balancing RSVP LSPs, it states:

For uneven load balancing using bandwidth to work, you must have at least two equal-cost LSPs toward the same egress router and at least one of the LSPs must have a bandwidth value configured at the [edit protocols mpls label-switched-path lsp-path-name] hierarchy level. If no LSPs have bandwidth configured, equal distribution load balancing is performed. If only some LSPs have bandwidth configured, the LSPs without any bandwidth configured do not receive any traffic.

This implies that regardless of that feature being configured, that no equal cost load balancing will happen if you do not statically set a bandwidth value on an individual LSP, like so:

[edit protocols mpls label-switched-path LSP1]
bandwidth 2g

However, auto-bandwidth does in fact count as setting a bandwidth value, despite it not being present in the configuration.

When auto bandwidth is enabled, RPD will begin monitoring bandwidth consumption. It will assign bandwidth values based on utilization, and then the "load-balance bandwidth" statement in RSVP will immediately begin attempting to keep the traffic ratios within those subscriptions (35, 35, 26, 5 respectively). The problem with this is that it never gives auto-bandwidth the chance to adjust evenly, because the "load-balance bandwidth"s goal, is to keep the traffic as close to those ratios as possible. This makes sense when they're set of something like, 10, 30, 20, 40.

It is essentially a race condition between "load-balance bandwidth" and "auto-bandwidth"

After removing:

[edit protocols rsvp] load-balance bandwidth

Traffic adjusted (with a slight hiccup, seen below):

NOTE: This is an example from a different router that was affected by the same issue.

jhead@R1> show log mpls-stats

LSP1 (LSP ID 3388, Tunnel ID 2646)    177150801 pkt   155450491134 Byte 178572 pps 152286259 Bps Util 228.46% Reserved Bw 66660264 Bps
LSP2 (LSP ID 3393, Tunnel ID 2647)            0 pkt              0 Byte      0 pps        0 Bps Util  0.00% Reserved Bw 116698880 Bps

Since you remove the ability to load-balance (for RSVP), the PFE will reprogram to only a single path until an auto-bandwidth adjust occurs automatically, or you can force an adjustment:

request mpls lsp adjust-autobandwidth

And below, are the bandwidth adjusts for 2 LSP's with the same symptoms, the configurations change and adjustments happened mid-day Friday, you can see the different in subscriptions almost immediately.

enter image description here enter image description here

Related Topic