How to avoid “NBP is too big to fit into free base memory” after searching for boot device repeatedly

bootipxenetbootpxe-boot

I have some servers which can repeatedly retry searching for boot devices (local sata disk, usb stick, netboot) at startup until one is available.

I have a netboot setup with iPXE as well. By default it checks for a custom boot script for the machine's mac address and executes that. If there is none, it exits.

This works fine for a while. But after 5 or so minutes, I start getting "NBP is too big to fit into free base memory" and it does not execute custom scripts. Instead, I think it's failing to start the iPXE 'Network Bootstrap Program'.

These systems have >64GB of ram. So it's not running out of system ram. But perhaps there's a very small range available to the card, or memory on the card itself and each time it tries booting it has to use new memory for that attempt.

It feels to me like a memory leak in the firmware of the NIC.

I would expect that the NIC completely freshly initializes when it says 'initializing' on OPROM screen of the NIC during boot. I would expect that when it fails to boot via NIC, all ram used by that nic is freed.

How do I stop this NBP error from happening? Is there a command in ipxe to free up this memory or force a full nic re-initialization?

Best Answer

It is running out of memory. I'm assuming you are using legacy pcbios here, since everything points to that. In this old era, we actually only have 640KB of Base memory to use for initial program.

Now the reason for running out is that in the old pcbios UNDI stack, there actually is no clean way of exiting and cleaning up everything.

The simplest way to "fix" this is to reboot the machine. Maybe some things can be improved, but to help further, a more detailed log of the output is preferred, as well as which iPXE binary you are using, and the full error message that you get (if any) from iPXE (it will have a ipxe.org url in it - also read that page)

I remember a similar issue being discussed (probably in the iPXE forum) but that was many years ago, and don't remember enough to find it. (the error message could help)

EDIT: A few ideas to help you live longer is to:

  1. If you intent to use iPXE, try to use a smaller build of iPXE (many features can be disabled) that way when it is actually needed it can succeed with less memory available. (this does not fix the issue, but does make it try longer before it fails)
  2. Always chain into iPXE, and then have an embedded script that loops, something like:
#!ipxe
:retry
autoboot || goto :retry

This will retry inside iPXE forever, without doing the initialization (and eating memory) on each try.

Related Topic