Juniper SRX240H loses connectivity to upstream router

juniper-srx

I have two SRX240's clustered together. The clustering is configured with 2 redundancy groups and then 3 RETH's.

reth1 is connected to our internet connection via ge-0/0/5 and ge-5/0/5. Our internet connection is a 100mbps each way leased line. We have to make sure they we specify disable the auto negotiate settings and manually set the speed and full duplex mode.

Here is the config showing those interfaces

cluster {
    reth-count 3;
    redundancy-group 0 {
        node 0 priority 100;
        node 1 priority 99;
    }
    redundancy-group 1 {
        node 0 priority 100;
        node 1 priority 99;
        preempt;
        interface-monitor {
            ge-0/0/5 weight 255;
            ge-5/0/5 weight 255;
        }
     }
}

interfaces {
    ge-0/0/5 {
        speed 100m;
        link-mode full-duplex;
        gigether-options {
            no-auto-negotiation;
            redundant-parent reth1;
        }
    }
    ge-5/0/5 {
        speed 100m;
        gigether-options {
            no-auto-negotiation;
            redundant-parent reth1;
        }
    }
    reth1 {
        redundant-ether-options {
            redundancy-group 1;
        }
        unit 0 {
            family inet {
                address 1.1.1.25/30;
            }
        }
    } 
}

The upstream NTE sits on IP address 1.1.1.26/30 (IP's changed).

My connection works fine, I get close to maximum download speeds for the line speed. I have low latency and everything else that you would expect. However every now and then I suddenly can't ping the upstream NTE. The connectivity just stops.

If I check the interface status it shows as being up.

{primary:node0}
gareth@FW01> show interfaces ge-0/0/5
Physical interface: ge-0/0/5, Enabled, Physical link is Up
  Interface index: 139, SNMP ifIndex: 527
  Link-level type: Ethernet, MTU: 1514, Link-mode: Full-duplex, Speed: 100mbps,
  BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled,
  Source filtering: Disabled, Flow control: Enabled, Auto-negotiation: Enabled,
  Remote fault: Online
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: 0x0
  Link flags     : None
  CoS queues     : 8 supported, 8 maximum usable queues
  Current address: 00:10:de:ff:20:01, Hardware address: 08:81:f4:cd:a1:05
  Last flapped   : 2013-05-16 01:35:08 UTC (03:39:01 ago)
  Input rate     : 7144 bps (10 pps)
  Output rate    : 34488 bps (58 pps)
  Active alarms  : None
  Active defects : None
  Interface transmit statistics: Disabled

  Logical interface ge-0/0/5.0 (Index 74) (SNMP ifIndex 528)
    Flags: SNMP-Traps 0x0 Encapsulation: ENET2
    Input packets : 20966964
    Output packets: 13453431
    Security: Zone: Null
    Protocol aenet, AE bundle: reth1.0   Link Index: 0

{primary:node0}
gareth@FW01> show interfaces reth1.0
  Logical interface reth1.0 (Index 68) (SNMP ifIndex 578)
    Flags: SNMP-Traps 0x0 Encapsulation: ENET2
    Statistics        Packets        pps         Bytes          bps
    Bundle:
        Input :      20967161         12   19068965037         9320
        Output:      13454302         44    3387733487        29088
    Security: Zone: untrust-internet
    Allowed host-inbound traffic : ping
    Protocol inet, MTU: 1500
      Flags: Sendbcast-pkt-to-re
      Addresses, Flags: Is-Preferred Is-Primary
        Destination: 1.1.1.24/30, Local: 1.1.1.25,
        Broadcast: 1.1.1.27

The reason for the last flap was because I changed the network cable to rule that out. Unplugging the cable and connecting a new one causes the connection to come back. I am a little reluctant to trust it at the moment.

Does anyone have any other things I can look at if it happens again?

I did find the following post http://kb.juniper.net/InfoCenter/index?page=content&id=KB16672 which says it should work on supported versions.

The releases, specifically for High End SRX devices, are 11.2R1, 11.1R1, 10.3R3, 10.4R2, 10.2R4 or later. For Branch SRX devices, this is supported only from Junos 11.1R4, 11.2R2, and 11.4R1 onwards.

I am running version 11.2R4.3

Best Answer

Here's some version info that should help. I think that the other thing that immediately strikes me is autonegotiation being disabled, which shouldn't be necessary. Is this something that your provider is asking for, or based on learned behaviors? I recommend reading what's out there, since autonegotiation has been recommended for over 10 years. Anything you're connecting to should work fine with it as well (the last time I had issues was on a 3500XL - we're talking ooooold). Check out:

http://etherealmind.com/ethernet-autonegotiation-works-why-how-standard-should-be-set/

First off, Junos 11.4R7.5 is the supported release recommended for SRX240, as of 8 April 2013, per JTAC (since you mentioned versions).

http://kb.juniper.net/InfoCenter/index?page=content&id=KB21476 (need login)

Also, JTAC recommends EEOL releases on SRX or J Series for IPsec features (may or may not apply, but FYI). (For me, the only releases available for SRX240H on juniper.net are 10.4, 11.4, and 12.1 - that may help set your expectations on what to run.)

http://www.juniper.net/alerts/viewalert.jsp?txtAlertNumber=PSN-2013-01-822&actionBtn=Search (need login)

You are also running a version that was released before the malformed TCP vulnerability came out. I shudder to think that a live exploit exists, but you definitely need to be on versions that were published around January or February 2013, or later. The versions you need for 11.2 are 11.2R5.5 or 11.2R7.5 (or higher)

http://osvdb.org/89751

I mentioned versions because the SRX is a 'rapidly evolving' platform as well, and anecdotal evidence and hearsay suggests that you'll want to upgrade more frequently.