PXE-Booting VM – Why Aggressively Seeks Reverse ARP?

arpdhcppxe-bootvmware-esxi

Reverse ARP is.. well, pretty much dead, as far as I'm aware? One of the great Internet success stories in killing off a protocol? It's been deprecated in favor of BOOTP (and later, DHCP) for almost three decades.

So, I was a little surprised to notice a VM mercilessly asking for an IP address via RARP during a PXE boot – even after getting a perfectly good IP address via DHCP.


At the start of the boot, the first broadcast packet that gets sent is a Reverse ARP packet, followed immediately by a DHCP broadcast.

20:31:19.408086 ARP, Reverse Request who-is 00:0c:29:20:fd:ce tell 00:0c:29:20:fd:ce, length 46
20:31:19.441857 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:0c:29:20:fd:ce, length 548
20:31:19.443536 IP 192.168.100.1.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length 300

Apparently it doesn't like that first DHCP response, as it waits for another (note that I'm only capturing broadcast packets, so tcpdump doesn't see the rest of the DHCP conversation), but not before sending another couple of RARP requests:

20:31:19.935341 ARP, Reverse Request who-is 00:0c:29:20:fd:ce tell 00:0c:29:20:fd:ce, length 46
20:31:20.935426 ARP, Reverse Request who-is 00:0c:29:20:fd:ce tell 00:0c:29:20:fd:ce, length 46
20:31:21.500371 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:0c:29:20:fd:ce, length 548
20:31:21.501288 IP 192.168.100.1.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length 300
20:31:21.504278 ARP, Request who-has 192.168.100.40 tell 192.168.100.6, length 46
20:31:22.935467 ARP, Reverse Request who-is 00:0c:29:20:fd:ce tell 00:0c:29:20:fd:ce, length 46

Ok, great; it got an IP, then ARP'd for the PXE server so that it could fire up the TFTP. It snuck another RARP in there at the end, fine. Now it's all the way into the pxelinux environment:

pxe

And then? Another RARP request, exactly 1 second after the last one.

20:31:23.935340 ARP, Reverse Request who-is 00:0c:29:20:fd:ce tell 00:0c:29:20:fd:ce, length 46

And another, exactly 2 seconds later.

20:31:25.935384 ARP, Reverse Request who-is 00:0c:29:20:fd:ce tell 00:0c:29:20:fd:ce, length 46

And then some. 3 seconds later, 5 seconds after that, 8 seconds, 13 seconds, 21 seconds.. and at that point it finally subsides.

20:31:28.935548 ARP, Reverse Request who-is 00:0c:29:20:fd:ce tell 00:0c:29:20:fd:ce, length 46
20:31:33.935518 ARP, Reverse Request who-is 00:0c:29:20:fd:ce tell 00:0c:29:20:fd:ce, length 46
20:31:41.935633 ARP, Reverse Request who-is 00:0c:29:20:fd:ce tell 00:0c:29:20:fd:ce, length 46
20:31:54.935970 ARP, Reverse Request who-is 00:0c:29:20:fd:ce tell 00:0c:29:20:fd:ce, length 46
20:32:15.936232 ARP, Reverse Request who-is 00:0c:29:20:fd:ce tell 00:0c:29:20:fd:ce, length 46

All while in the pxelinux environment, with a working IP address already bound to that NIC.


So, does anyone have any idea what this VM (or rather, every ESX(i) VM, at least on 4.1 and 5.0) is thinking?

I've verified that it occurs on both the emulated E1000 as well as the vmxnet3 device; is this a bit of "special" behavior of the VMware PXE code, or is this a typical behavior of any ol' PXE code?

Does it make any kind of sense for it to be looking for RARP at all since as a protocol it's not able to provide any PXE server information (as far as I know)?

Do I need to bite the bullet and set up rarpd to see how the PXE device will react to it?

Best Answer

I may be wrong but this seems like RARP packets sent by vkernel if vswitch has 'Notify Switches' configuration on, which is on by default.