Lvm – RAID6 mdraid -> LVM -> EXT4 root with GRUB2

debian-wheezyext4grub2lvmmdraid

2012-03-31 Debian Wheezy daily build in VirtualBox 4.1.2, 6 disk devices.

My steps to reproduce so far:

  1. Setup one partition, using the entire disk, as a physical volume for RAID, per disk
  2. Setup a single RAID6 mdraid array out of all of those
  3. Use the resulting md0 as the only physical volume for the volume group
  4. Setup your logical volumes, filesystems and mount points as you wish
  5. Install your system

Both / and /boot will be in this stack. I've chosen EXT4 as my filesystem for this setup.

I can get as far as GRUB2 rescue console, which can see the mdraid, the volume group and the LVM logical volumes (all named appropriately on all levels) on it, but I cannot ls the filesystem contents of any of those and I cannot boot from them.

As far as I can see from the documentation the version of GRUB2 shipped there should handle all of this gracefully.

http://packages.debian.org/wheezy/grub-pc (1.99-17 at the time of writing.)

It is loading the ext2, raid, raid6rec, dosmbr (this one is in the list of modules once per disk) and lvm modules according to the generated grub.cfg file. Also it is defining the list of modules to be loaded twice in the generated grub.cfg file and according to quick Googling around this seems to be the norm and OK for GRUB2.

How to get further by getting GRUB2 to actually be able to read the content of the filesystems and boot the system?

What am I wrong about in my assumptions of functionality here?

EDIT (2012-04-01)
My generated grub.cfg:

http://pastie.org/3708436

It seems it first makes my /usr logical volume the root and that might be source of the failure? A grub-mkconfig bug? Or is it supposed to get access to stuff from /usr before / and /boot? /boot is on / for me – no separate boot logical volume.

Best Answer

After all, it was a Grub2 bug/issue with a degraded software raid array.

Grub2 1.9x has issues with booting from a degraded array. Booting in rescue mode onto the system and letting the raid recover itself has fixed the issue for the original setup in question.

Incidentally the setup works (at the moment: 2012-06-26) straight out of the box on Fedora 17, Arch (stable) and Gentoo (stable + latest grub2 bzr via Portage): Grub2 2.0+ has fixed the issue. With the Wheezy freeze hitting soon, I'm thoroughly hoping for the issue to be resolved via either jumping to 2.0 or backporting the fix.

For me this still affects Debian 6, 7; Ubuntu 8.04, 10.04, 12.04.

Letting the raid sync in a single user mode recovery setup is an acceptable workaround for a home system, but having a potential extra hitch for rebooting a production server (even a small office file server) makes one think twice.