Juniper QSFP+ – Troubleshooting Partially Faulty Breakout Port

juniperjuniper-junos

Network Equipment: Juniper Ex4600, 40GbE QSFP+ -> 4x 10GbE SFP+ DAC cable, Mellanox ConnectX-2

Problem: One of the SFP+ physical link cannot be brought up (xe-0/0/25:0 Physical Link Down, xe-0/0/25:1,2,3 Physical Link Up), therefore no network access.

Existing Troubleshooting Performed, Information Collected:

  1. Network Card Working:

    • Tested server with xe-0/0/25:1, Physical link up, network access okay, tested with xe-0/0/25:0, Physical link down, no network access
    • When server connected with xe-0/0/25:1 (working), network card's "connection established" light turns green, activity lights soon flashed. Corresponding LED light for xe-0/0/25:1 on EX4600 is on and green.
    • When server connected with xe-0/0/25:1 (faulty), network card's "connection established" light turns green, activity lights stays off. Corresponding LED light for xe-0/0/25:0 on EX4600 is off.
  2. JunOS network setting for xe-0/0:25:1 (working)

root> show interfaces xe-0/0/25:1 detail
Physical interface: xe-0/0/25:1, Enabled, Physical link is Up
  Interface index: 721, SNMP ifIndex: 557, Generation: 214
  Link-level type: Ethernet, MTU: 1514, MRU: 0, Speed: 10Gbps, BPDU Error: None,
  MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
  Flow control: Disabled, Media type: Fiber
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: 0x4000
  Link flags     : None
  CoS queues     : 12 supported, 12 maximum usable queues
  Hold-times     : Up 0 ms, Down 0 ms
  Current address: 0c:86:10:3d:89:20, Hardware address: 0c:86:10:3d:89:20
  Last flapped   : 2018-07-28 02:36:09 UTC (2w3d 04:09 ago)
  Statistics last cleared: Never
  Traffic statistics:
   Input  bytes  :      177488100981384                    0 bps
   Output bytes  :      167345335559587                  176 bps
   Input  packets:         124325557902                    0 pps
   Output packets:         117878406643                    0 pps
   IPv6 transit statistics:
    Input  bytes  :                   0
    Output bytes  :                   0
    Input  packets:                   0
    Output packets:                   0
  Egress queues: 12 supported, 5 in use
  Queue counters:       Queued packets  Transmitted packets      Dropped packets
    0                                0         117747387692             57924403
    3                                0                    0                    0
    4                                0                    0                    0
    7                                0             29892590                    0
    8                                0             54714667                    0
  Queue number:         Mapped forwarding classes
    0                   best-effort
    3                   fcoe
    4                   no-loss
    7                   network-control
    8                   mcast
  Active alarms  : None
  Active defects : None
  Interface transmit statistics: Disabled
  MACSec statistics:
    Output
        Secure Channel Transmitted
        Protected Packets               : 0
        Encrypted Packets               : 0
        Protected Bytes                 : 0
        Encrypted Bytes                 : 0
     Input
        Secure Channel Received
        Accepted Packets                : 0
        Validated Bytes                 : 0
        Decrypted Bytes                 : 0

  Logical interface xe-0/0/25:1.0 (Index 609) (SNMP ifIndex 561)
   (Generation 196)
    Flags: Up SNMP-Traps 0x24024000 Encapsulation: Ethernet-Bridge
    Traffic statistics:
     Input  bytes  :             50018849
     Output bytes  :            570590460
     Input  packets:               767221
     Output packets:              1921180
    Local statistics:
     Input  bytes  :             50018849
     Output bytes  :            570590460
     Input  packets:               767221
     Output packets:              1921180
    Transit statistics:
     Input  bytes  :                    0                    0 bps
     Output bytes  :                    0                    0 bps
     Input  packets:                    0                    0 pps
     Output packets:                    0                    0 pps
    Protocol eth-switch, MTU: 1514, Generation: 221, Route table: 5
  1. JunOS network setting for xe-0/0:25:0 (faulty)
root> show interfaces xe-0/0/25:0 detail
Physical interface: xe-0/0/25:0, Enabled, Physical link is Down
  Interface index: 720, SNMP ifIndex: 556, Generation: 213
  Link-level type: Ethernet, MTU: 1514, MRU: 0, Speed: 10Gbps, BPDU Error: None,
  MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
  Flow control: Disabled, Media type: Fiber
  Device flags   : Present Running Down
  Interface flags: Hardware-Down SNMP-Traps Internal: 0x4000
  Link flags     : None
  CoS queues     : 12 supported, 12 maximum usable queues
  Hold-times     : Up 0 ms, Down 0 ms
  Current address: 0c:86:10:3d:89:1f, Hardware address: 0c:86:10:3d:89:1f
  Last flapped   : 2018-08-14 04:17:55 UTC (02:02:59 ago)
  Statistics last cleared: Never
  Traffic statistics:
   Input  bytes  :                 8943                    0 bps
   Output bytes  :               128037                    0 bps
   Input  packets:                   73                    0 pps
   Output packets:                 1069                    0 pps
   IPv6 transit statistics:
    Input  bytes  :                   0
    Output bytes  :                   0
    Input  packets:                   0
    Output packets:                   0
  Egress queues: 12 supported, 5 in use
  Queue counters:       Queued packets  Transmitted packets      Dropped packets
    0                                0                    1                    0
    3                                0                    0                    0
    4                                0                    0                    0
    7                                0                  506                    0
    8                                0                  207                    0
  Queue number:         Mapped forwarding classes
    0                   best-effort
    3                   fcoe
    4                   no-loss
    7                   network-control
    8                   mcast
  Active alarms  : LINK
  Active defects : LINK
  Interface transmit statistics: Disabled
  MACSec statistics:
    Output
        Secure Channel Transmitted
        Protected Packets               : 0
        Encrypted Packets               : 0
        Protected Bytes                 : 0
        Encrypted Bytes                 : 0
     Input
        Secure Channel Received
        Accepted Packets                : 0
        Validated Bytes                 : 0
        Decrypted Bytes                 : 0

  Logical interface xe-0/0/25:0.0 (Index 608) (SNMP ifIndex 558)
   (Generation 195)
    Flags: Device-Down SNMP-Traps 0x24024000 Encapsulation: Ethernet-Bridge
    Traffic statistics:
     Input  bytes  :                 4565
     Output bytes  :                10395

     Input  packets:                   48
     Output packets:                   35
    Local statistics:
     Input  bytes  :                 4565
     Output bytes  :                10395
     Input  packets:                   48
     Output packets:                   35
    Transit statistics:
     Input  bytes  :                    0                    0 bps
     Output bytes  :                    0                    0 bps
     Input  packets:                    0                    0 pps
     Output packets:                    0                    0 pps
    Protocol eth-switch, MTU: 1514, Generation: 220, Route table: 5

Note: Last flapped : 2018-08-14 04:17:55 UTC (02:02:59 ago), Active alarms : LINK, Active defects : LINK. The server was rebooted machine 2 hours ago, switch detected device attached to the port, however active alarm and defect shows "LINK".

  1. Spanning tree state: Blocked
root> show ethernet-switching interface xe-0/0/25:0
Routing Instance Name : default-switch
Logical Interface flags (DL - disable learning, AD - packet action drop,
                         LH - MAC limit hit, DN - interface down,
                         SCTL - shutdown by Storm-control,
                         MMAS - Mac-move action shutdown ) 

Logical          Vlan          TAG     MAC         STP         Logical           Tagging 
interface        members               limit       state       interface flags  
xe-0/0/25:0.0                          294912                   DN                untagged   
                 default       1       294912      Discarding                     untagged

Updates:

  1. Further debugging revealed issue between two specific network cards (mellanox connectx-2) which cannot connect to the switch simultaneously. Both will work with the switch perfectly by itself. The MAC address of the network cards are different. On the same switch, there are 20+ other mellanox connectx-3 10GbE network cards all happily running concurrently on the same switch.

Update

Problem temporarily fixed by disabling RSTP on problematic port, delete subinterface, add it back. However, sometimes when server reboots, physical network would be down. The interface must be toggled off/on administratively to re-enable physical interface

root> show interfaces xe-0/0/25:0
Physical interface: xe-0/0/25:0, Enabled, Physical link is Down
  Interface index: 720, SNMP ifIndex: 556
  Link-level type: Ethernet, MTU: 1514, MRU: 0, Speed: 10Gbps, BPDU Error: None,
  MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
  Flow control: Disabled, Media type: Fiber
  Device flags   : Present Running Down
  Interface flags: Hardware-Down SNMP-Traps Internal: 0x4000
  Link flags     : None
  CoS queues     : 12 supported, 12 maximum usable queues
  Current address: 0c:86:10:3d:89:1f, Hardware address: 0c:86:10:3d:89:1f
  Last flapped   : 2018-08-16 11:14:14 UTC (16:29:48 ago)
  Input rate     : 0 bps (0 pps)
  Output rate    : 0 bps (0 pps)
  Active alarms  : LINK
  Active defects : LINK
  Interface transmit statistics: Disabled

  Logical interface xe-0/0/25:0.0 (Index 608) (SNMP ifIndex 558)
    Flags: Device-Down SNMP-Traps 0x24024000 Encapsulation: Ethernet-Bridge
    Input packets : 4067
    Output packets: 4827
    Protocol eth-switch, MTU: 1514

{master:0}[edit]
root# set interfaces xe-0/0/25:0 disable

{master:0}[edit]
root# commit
configuration check succeeds
fpc1:
commit complete
commit complete

{master:0}[edit]
root# run show interfaces xe-0/0/25:0
Physical interface: xe-0/0/25:0, Administratively down, Physical link is Down
  Interface index: 720, SNMP ifIndex: 556
  Link-level type: Ethernet, MTU: 1514, MRU: 0, Speed: 10Gbps, BPDU Error: None,
  MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
  Flow control: Disabled, Media type: Fiber
  Device flags   : Present Running Down
  Interface flags: Hardware-Down Down SNMP-Traps Internal: 0x4000
  Link flags     : None
  CoS queues     : 12 supported, 12 maximum usable queues
  Current address: 0c:86:10:3d:89:1f, Hardware address: 0c:86:10:3d:89:1f
  Last flapped   : 2018-08-16 11:14:14 UTC (16:38:47 ago)
  Input rate     : 0 bps (0 pps)
  Output rate    : 0 bps (0 pps)
  Active alarms  : LINK
  Active defects : LINK
  Interface transmit statistics: Disabled

  Logical interface xe-0/0/25:0.0 (Index 608) (SNMP ifIndex 558)
    Flags: Device-Down SNMP-Traps 0x24024000 Encapsulation: Ethernet-Bridge
    Input packets : 4067
    Output packets: 4827
    Protocol eth-switch, MTU: 1514

{master:0}[edit]
root# delete interfaces xe-0/0/25:0 disable

{master:0}[edit]
root# commit
configuration check succeeds
fpc1:
commit complete
commit complete

root# run show interfaces xe-0/0/25:0
Physical interface: xe-0/0/25:0, Enabled, Physical link is Up
  Interface index: 720, SNMP ifIndex: 556
  Link-level type: Ethernet, MTU: 1514, MRU: 0, Speed: 10Gbps, BPDU Error: None,
  MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
  Flow control: Disabled, Media type: Fiber
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: 0x4000
  Link flags     : None
  CoS queues     : 12 supported, 12 maximum usable queues
  Current address: 0c:86:10:3d:89:1f, Hardware address: 0c:86:10:3d:89:1f
  Last flapped   : 2018-08-17 03:53:25 UTC (00:00:16 ago)
  Input rate     : 304 bps (0 pps)
  Output rate    : 0 bps (0 pps)
  Active alarms  : None
  Active defects : None
  Interface transmit statistics: Disabled

  Logical interface xe-0/0/25:0.0 (Index 608) (SNMP ifIndex 558)
    Flags: Up SNMP-Traps 0x24024000 Encapsulation: Ethernet-Bridge
    Input packets : 4132
    Output packets: 4829
    Protocol eth-switch, MTU: 1514

Best Answer

Solution (although not a good one) is to disable RSTP, reconfigure the subinterface interface, physically disconnect and reconnect the network cable multiple times. This is the only way with high probability to enabling networking on both network cards.

Note: This is not a hack solution which simply mask the problem, but would temporarily allow both network card to connect. We resolved the situation by replacing one of the network card.

Related Topic