Grub2 does not find /boot on RAID1 md0 device

bootgptgrub2raid1raid10

I have RAID1 md0 for /boot consisting of 4 partitions (sda2, sdb2, sdc2, sdd2).
I'm using GPT on 2TB HDDs, so first partitions on each disk (sda1, …) are 1-megabyte bios_grub partitions.

I also have RAID10 md1 for LVM (containing /) and RAID0 md2 for swap, both built from partitions on all 4 drives.

mdadm persistant superblock version is 0.9.

Grub was installed with something like grub-install --modules="mdraid lvm" '(hd0)' on all 4 drives (hd0, hd1, hd2, hd3).

The problem.

On reboot, grub2 fails with "error: no such disk" and displays "grub rescue>" prompt.
ls command only shows 4 disks and their partitions – but no md* devices.
Trying insmod normal again gives "error: no such disk.".
Examining 'root' and 'prefix' shows something like '(md0)/grub', which is correct.
Doing set prefix=(hd0,2)/grub and then insmod normal allows to boot normally.

The question.

Why grub2 doesn't see md0?

So far the only solution I can see is to manually build grub image with hard-coded working prefix (grub-mkimage –prefix='(hd0,2)/grub'), then use grub-setup to write the image to each disk. However, this solution is ugly, and error-prone (to avoid errors, will need to investigate how grub-install calls these two commands). I will appreciate better solutions. (Note: this is a remote server, so cannot really do 'reboot debugging'.)

Best Answer

RAID is still one of the gray areas of bootloaders IMHO.

I recently built a RAID1 system and after a few hours trying to get LILO/GRUB/GRUB2 to detect my raid i gave up and just told it to use the first partition of the first HDD detected and made sure that if a HDD failed the next HDD was already lined up with the correct MBR/bootloader ect...

So what it does is it boots, grabs the kernel and initfs off the first HDD (no raid) and then boots the kernel and leaves all the RAID stuff to the kernel. Because GRUB/LILO do not physically write to the drives this wont damage them.

Basically i just ignored RAID all together for the bootloader stage.

the kernel needs to re-assemble the raid arrays even if grub does it first. there's no real reason for grub to need to be raid aware for a RAID1 system unless a drive fails during boot.

P.S. You dont need to raid0 SWAP, this ability is already in the kernel. Just set the priority for both swap devices to 1 in FSTAB

/dev/sda2         none                    swap  sw,pri=1        0 0
/dev/sdb2         none                    swap  sw,pri=1        0 0
ect....

And if a single swap drive fails during normal operations there's a very good chance your system will fail. (you can raid1 swap, just not from fstab like above)