SRX DHCP client compatibility with HP Procurve DHCP Relay

dhcphp-procurvejuniperjuniper-junosjuniper-srx

I am trying to bootstrap the config on some Juniper SRX100s and am having some DHCP issues.

Specifically, I am connecting the 0/0 port (fe-0/0/0 in the software) to my existing network, where DHCP has worked quite reliably for just about every other device I've used. The SRX100s are not getting DHCP addresses. The SRX100 is out-of-the-box default config when I'm attempting this.

I brought one of the devices to my house and plugged it into my home network and it got an IP address on my home network via DHCP with no problems.

My office network has a Procurve 1400 (layer 2 only) switch on my desktop, uplinked to a Polycom IP670 IP phone (acts as a simple layer 2 switch), uplinked to a Procurve 3500yl switch acting as a router for the network with "ip helper-address 1.1.1.1" on the vlan interface pointing to the DHCP server for DHCP relay.

Does anyone have any experience with getting an SRX DHCP client getting an IP address via a Procurve (running K.15.09.0012 software…though the problem has existed across multiple firmware versions on the Procurve). The SRX100s seem to have 11.2 on them when they come out the box, though I think the problem continues to exist when upgraded to 12.1X44-D10.4.

Does anyone have any suggestions for troubleshooting this? The Procurve 3500yl doesn't seem to admit to having seen the DHCP client request coming in from the SRX100, but troubleshooting info on the Procurves in this area seems limited. The DHCP server definitely does not see any DHCPDISCOVER packets arrive relating to the SRX100.

My workaround has been to statically configure an IP address on the SRX100s to get them on the network and do the rest of my config, but the project I am working on involves sending the SRX100s out to remote locations that are not under my control and, thus, depends on them reliably getting DHCP addresses for connectivity so I would really like to troubleshoot this and run down a specific cause so I know what to potentially look for if this happens at remote sites.

Update: I have (to double-check) factory-defaulted the SRX100, and plugged it directly into a port on a Procurve 3500yl and am still seeing the problem, so that removes the 1400 and the IP670 phone from the discussion. I've included the tcpdump output from the SRX100 below…as you can see, its sending out about the simplest possible DHCP packet possible, when tends to suggest that the problem is with the dhcp-relay function on the 3500yl. I can't find any way to get any debug output from the 3500yl showing packets hitting the dhcp-relay function (successfully or otherwise). Suggestions on how to debug this function on the 3500yl would be greatly appreciated.

tcpdump -n -s 0 -c 1 -vvv -r juniper.dhcp.pcap 
reading from file juniper.dhcp.pcap, link-type JUNIPER_ETHER (Juniper Ethernet)
17:49:11.538670 
Juniper PCAP Flags [Ext], PCAP Extension(s) total length 16
  Device Media Type Extension TLV #3, length 1, value Ethernet (1)
  Logical Interface Encapsulation Extension TLV #6, length 1, value Ethernet (14)
  Device Interface Index Extension TLV #1, length 2, value 34304
  Logical Interface Index Extension TLV #4, length 4, value 70
-----original packet-----
IP (tos 0x0, ttl 1, id 13874, offset 0, flags [none], proto UDP (17), length 328)
0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from a8:d0:e5:1c:68:80, length 300, xid 0x643c9869, Flags [Broadcast] (0x8000)
  Client-Ethernet-Address a8:d0:e5:1c:68:80
  Vendor-rfc1048 Extensions
    Magic Cookie 0x63825363
    DHCP-Message Option 53, length 1: Discover
    END Option 255, length 0
    PAD Option 0, length 0, occurs 56

Best Answer

I opened a case with HP concerning this issue. After escalating past the useless Level 1 tech, the Level 2 tech very alertly spotted something that I had not.

The SRX is sending its DHCPDISCOVER packet with a TTL of 1. The Procurve's apparently will decrement the TTL and use the resulting TTL in the relay'ed packet to the DHCP server. In this case, the decrement leaves the TTL at 0 meaning the packet gets dropped on the floor.

This is actually in spec for DHCP/BOOTP relay, though clearly it causes reduced interoperability. I have asked HPNetworking to treat this as a bug/RFE and change the behavior. No immediate response to that request in the case.

The SRX sending the DHCPDISCOVER with a TTL of 1 is also probably within spec, but, again, a choice of reduced interoperability, so I plan to open a case with JTAC on the same basis.

I'll add more info on the response of Juniper and HP as it becomes available.

Incidentally, I have tested the relay behavior of a Cisco 4506 (firmware version not immediately available), and a Brocade/Foundry FastIron Edge X (7.2 or 7.3 firmware, I believe, don't have immediate access to confirm) and they both handle relaying the request with TTL 1 without issue.

UPDATE There is a way to change the TTL value that the SRX uses on its DHCP requests, but its not from within the JunOS cli...its done from the underlying Unix OS.

root@% sysctl -w net.inet.ip.mcast_ttl=64

I have opened an RFE with HP to make their relaying function more resilient, but not response from them yet on if/when that will be worked on.