SuperMicro Superblade fails to boot from hard drive

blade-servergrubmegaraidsupermicro

Following issue. I have a number of Superblades with LSI MegaRAID SAS 9240-4i cards. One card per server. All of them use the latest firmware (20.13.1-0176) as of this writing (as per the LSI website).

Here one sample configuration of one of the ones that boot (top lines from megacli -AdpAllInfo -aALL).

                    Versions
                ================
Product Name    : LSI MegaRAID SAS 9240-4i
Serial No       : SP10195095
FW Package Build: 20.13.1-0176

                    Mfg. Data
                ================
Mfg. Date       : 01/12/11
Rework Date     : 00/00/00
Revision No     : 03A
Battery FRU     : N/A

                Image Versions in Flash:
                ================
BIOS Version       : 4.38.02.0_4.16.08.00_0x06060900
Preboot CLI Version: 03.02-020:#%00009
WebBIOS Version    : 4.0-60-e_49-Rel
NVDATA Version     : 3.09.03-0056
FW Version         : 2.130.404-3067
Boot Block Version : 2.02.00.00-0001

All the ones I am asking about report no problem and claim the array to be in good health. Two of them are running the consistency check at the moment, but overall five out of ten do not boot from the hard drive.

Symptoms

The BIOS is set to "optimal defaults". I have changed the boot priority to add a CD-ROM for those that did not boot on their own.

All of the systems can be booted up fine using a CD-ROM (Ubuntu 14.04.1 amd64). However, only five out of ten boot from the (virtual, i.e. RAID) hard drive after the MegaRAID BIOS has finished.

The remainder of them gets stuck after the screen blanks right after the MegaRAID BIOS shows its stats and the prompt for WebBIOS etc. It only shows a (non-blinking) cursor and does not proceed. I have waited for a really long time to see a BIOS error message about missing hard drive or similar, but nothing. If I insert the CD and tell the boot manager there to boot from the first hard disk, the same symptoms appear.

My gut feeling is that something is wrong with the boot sector, boot manager or similar. But it's basically impossible to confirm.

When I try to use grub-setup (from a booted live CD) I get:

grub-setup: warn: This GPT partition label has no BIOS Boot Partition; embedding won't be possible!.
grub-setup: warn: Embedding is not possible.  GRUB can only be installed in this setup by using blocklists.  However, blocklists are UNRELIABLE and their use is discouraged..
grub-setup: error: will not proceed with blocklists.

Of course GRUB is spot on. This is a GPT disk because it's 5.4 TB. However, some of them boot up fine while others don't. They should for all practical purposes be (and behave) identical, though.

The partition setup looks like this:

# parted /dev/sda print
Model: LSI MR9240-4i (scsi)
Disk /dev/sda: 5997GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system     Name  Flags
 1      1049kB  1024MB  1023MB  ext4                  boot
 2      1024MB  25.6GB  24.6GB  ext4                  msftdata
 3      25.6GB  50.2GB  24.6GB  linux-swap(v1)
 4      50.2GB  74.8GB  24.6GB  ext4                  msftdata
 5      74.8GB  5997GB  5922GB  ext4                  msftdata

Unlike some other servers I manage, this doesn't have a "BIOS boot" partition. Not sure this matters in this case – but again, would it matter why do the others boot (yes, they have the exact same layout as all of them were set up with a preseeded installation CD).

Any ideas:

  • how to debug this boot problem (the cursor and blank screen really do not help)
  • how can I make a system such as this one bootable, even without GRUB, if it has to be

Best Answer

The remainder of them gets stuck after the screen blanks right after the MegaRAID BIOS shows its stats and the prompt for WebBIOS etc. It only shows a (non-blinking) cursor and does not proceed. I have waited for a really long time to see a BIOS error message about missing hard drive or similar, but nothing. If I insert the CD and tell the boot manager there to boot from the first hard disk, the same symptoms appear.

I have seen something similar on some of the Supermicro microcloud blade. To fix that,

  1. Go to BIOS PCI settings and change the compliance settings to detect non-compliant device.
  2. There is also another setting in BIOS to change the boot order so that Intel Netbios boot loads before the PCI device.
  3. Press enter when it's stuck in black screen

I am quite sure the problem is the legacy LSI SAS 9240-4i card, if you have the budget to change to a different model such as 9260, that would solved your issue.

Hope it helps.