Lvm – RAID6 mdraid -> LVM -> EXT4 root with GRUB2

debian-wheezyext4grub2lvmmdraid

2012-03-31 Debian Wheezy daily build in VirtualBox 4.1.2, 6 disk devices.

My steps to reproduce so far:

Setup one partition, using the entire disk, as a physical volume for RAID, per disk
Setup a single RAID6 mdraid array out of all of those
Use the resulting md0 as the only physical volume for the volume group
Setup your logical volumes, filesystems and mount points as you wish
Install your system

Both / and /boot will be in this stack. I've chosen EXT4 as my filesystem for this setup.

I can get as far as GRUB2 rescue console, which can see the mdraid, the volume group and the LVM logical volumes (all named appropriately on all levels) on it, but I cannot ls the filesystem contents of any of those and I cannot boot from them.

As far as I can see from the documentation the version of GRUB2 shipped there should handle all of this gracefully.

http://packages.debian.org/wheezy/grub-pc (1.99-17 at the time of writing.)

It is loading the ext2, raid, raid6rec, dosmbr (this one is in the list of modules once per disk) and lvm modules according to the generated grub.cfg file. Also it is defining the list of modules to be loaded twice in the generated grub.cfg file and according to quick Googling around this seems to be the norm and OK for GRUB2.

How to get further by getting GRUB2 to actually be able to read the content of the filesystems and boot the system?

What am I wrong about in my assumptions of functionality here?

EDIT (2012-04-01)
My generated grub.cfg:

http://pastie.org/3708436

It seems it first makes my /usr logical volume the root and that might be source of the failure? A grub-mkconfig bug? Or is it supposed to get access to stuff from /usr before / and /boot? /boot is on / for me – no separate boot logical volume.

Best Answer

After all, it was a Grub2 bug/issue with a degraded software raid array.

Grub2 1.9x has issues with booting from a degraded array. Booting in rescue mode onto the system and letting the raid recover itself has fixed the issue for the original setup in question.

Incidentally the setup works (at the moment: 2012-06-26) straight out of the box on Fedora 17, Arch (stable) and Gentoo (stable + latest grub2 bzr via Portage): Grub2 2.0+ has fixed the issue. With the Wheezy freeze hitting soon, I'm thoroughly hoping for the issue to be resolved via either jumping to 2.0 or backporting the fix.

For me this still affects Debian 6, 7; Ubuntu 8.04, 10.04, 12.04.

Letting the raid sync in a single user mode recovery setup is an acceptable workaround for a home system, but having a potential extra hitch for rebooting a production server (even a small office file server) makes one think twice.

Related Solutions

Lvm – Linux Software RAID1: How to boot after (physically) removing /dev/sda? (LVM, mdadm, Grub2)

You need to install GRUB to the MBR of both drives, and you need to do it in a way that GRUB considers each disk to be the first disk in the system.

GRUB uses its own enumeration for disks, which is abstracted from what the Linux kernel presents. You can change which device it thinks is the first disk (hd0), by using a "device" line in the grub shell, like so:

device (hd0) /dev/sdb

This tells grub that, for all subsequent commands, treat /dev/sdb as the disk hd0. From here you can complete the installation manually:

device (hd0) /dev/sdb
root (hd0,0)
setup (hd0)

This sets up GRUB on the first partition of the disk it considers to be hd0, which you've just set as /dev/sdb.

I do the same for both /dev/sda and /dev/sdb, just to be sure.

Edited to add: I always found the Gentoo Wiki handy, until I did this often enough to commit it to memory.

Linux – Any way to recover ext4 filesystems from a deleted LVM logical volume

Every time you perform an operation with LVM, by default, the previous metadata is archived in /etc/lvm/archive. You can use vgcfgrestore to restore it, or grab the extends by hand (harder, but lvcreate(8) should cover it).

Edit:

And to make it as easy as possible, I should add that you can find the last backup before your destructive operation by looking at descriptions:

# grep description /etc/lvm/archive/vg01_*
/etc/lvm/archive/vg01_00001.vg:description = "Created before executing 'lvremove -f /dev/vg01/foo'"
/etc/lvm/archive/vg01_00002.vg:description = "Created before executing 'lvremove -f /dev/vg01/bar'"
/etc/lvm/archive/vg01_00003.vg:description = "Created before executing 'lvremove -f /dev/vg01/baz'"

Edit:

The normal allocation policy (default one) will allocate a stripe from the first free PE when there is enough room to do so. If you want to confirm where the LV was allocated, you can look in the archive files, those are perfectly readable by humans.

Best Answer

Related Solutions

Lvm – Linux Software RAID1: How to boot after (physically) removing /dev/sda? (LVM, mdadm, Grub2)

Linux – Any way to recover ext4 filesystems from a deleted LVM logical volume

Related Topic