Recommended approach for booting from Linux software RAID for redundancy

centos8mdadmraid1software-raid

I am looking for the recommended approach to setting up software RAID (using mdadm) on Centos 8 to provide redundancy.

In my setup I have 4 identical disks, but I assume the answer should also be relevant to 2 disk setups.

The primary requirement here is redundancy and no requirement for manual intervention for the system to boot if any one disk fails.

In short – I am looking for the simplest, most reliable solution.

Questions:

What type of partition table is recommended (assuming Linux only, no dual boot)
UEFI boot vs. BIOS boot (UEFI seems unnecessarily complex and there appears to be many issues with it still)
fdisk/mdadm/lvm commands to setup both a /boot and a LVM volume that will contain the root and swap partitons
Can this all be done from the anaconda installation GUI?

I have had some problems with EFI/GPT systems booting so I'm not convinced of the advantage (it appears that additional partitions are required when using GPT some of which it doesn't seem possible to use software RAID). So also looking for reasons to go down the EFI/GPT route, if this is indeed the recommended approach.

Best Answer

I ended up going with a GPT partition table, but sticking with the old Bios boot.

This was in-part due to some of the complexities surrounding using RAID for an EFI boot partition - https://unix.stackexchange.com/questions/265368/why-is-uefi-firmware-unable-to-access-a-software-raid-1-boot-efi-partition and partly because I was recovering from an unbootable system that used EFI (still not sure why, but after trying to fix for a day it was just easier to rebuild. This is exactly what I don't want)

Anyway it appears it is possible to install and set this up this via the Centos8 anaconda installer. To do this do the following:

Set BIOS to boot your installation media (and HDDs) using legacy boot
Boot the installer
Manually partition your HDDs - I used the following:
1. 1 MiB Biosboot partition configured in raid 1
2. 1 GiB /boot partition configured in raid 1
3. Rest of the HDD space configured in a LVM raid (10 in my case) array with a / and swap partition (and whatever else you want)

I have tested this by pulling each HDD making sure the server could boot and then re-adding the "failed" drive back into the md arrays. All works as expected.

Below is the relevant parts for a kickstart config that was built from my install:

#version=RHEL8
ignoredisk --only-use=sda,sdb,sdc,sdd
# Partition clearing information
clearpart --all --initlabel --drives=sda,sdb,sdc,sdd

# Disk partitioning information
part raid.903 --fstype="mdmember" --ondisk=sda --size=2
part raid.1340 --fstype="mdmember" --ondisk=sdd --size=1025
part raid.2107 --fstype="mdmember" --ondisk=sdc --size=475912
part raid.910 --fstype="mdmember" --ondisk=sdb --size=2
part raid.1319 --fstype="mdmember" --ondisk=sda --size=1025
part raid.2114 --fstype="mdmember" --ondisk=sdd --size=475912
part raid.2100 --fstype="mdmember" --ondisk=sdb --size=475912
part raid.2093 --fstype="mdmember" --ondisk=sda --size=475912
part raid.1326 --fstype="mdmember" --ondisk=sdb --size=1025
part raid.1333 --fstype="mdmember" --ondisk=sdc --size=1025
part raid.917 --fstype="mdmember" --ondisk=sdc --size=2
part raid.924 --fstype="mdmember" --ondisk=sdd --size=2
raid biosboot --device=biosboot --fstype="biosboot" --level=RAID1 raid.903 raid.910 raid.917 raid.924
raid pv.2121 --device=pv00 --fstype="lvmpv" --level=RAID10 --chunksize=512 raid.2093 raid.2100 raid.2107 raid.2114
raid /boot --device=boot --fstype="ext4" --level=RAID1 raid.1319 raid.1326 raid.1333 raid.1340
volgroup cl_host01 --pesize=4096 pv.2121
logvol / --fstype="ext4" --size=51200 --name=root --vgname=cl_host01
logvol swap --fstype="swap" --size=4096 --name=swap --vgname=cl_host01

Related Solutions

Lvm – Linux Software RAID1: How to boot after (physically) removing /dev/sda? (LVM, mdadm, Grub2)

You need to install GRUB to the MBR of both drives, and you need to do it in a way that GRUB considers each disk to be the first disk in the system.

GRUB uses its own enumeration for disks, which is abstracted from what the Linux kernel presents. You can change which device it thinks is the first disk (hd0), by using a "device" line in the grub shell, like so:

device (hd0) /dev/sdb

This tells grub that, for all subsequent commands, treat /dev/sdb as the disk hd0. From here you can complete the installation manually:

device (hd0) /dev/sdb
root (hd0,0)
setup (hd0)

This sets up GRUB on the first partition of the disk it considers to be hd0, which you've just set as /dev/sdb.

I do the same for both /dev/sda and /dev/sdb, just to be sure.

Edited to add: I always found the Gentoo Wiki handy, until I did this often enough to commit it to memory.

Linux – Forgot to create a second RAID for swap partitions in Software RAID 1. Will it work

swapon -s will show your current active swap setup. If you see both swap-partitions with the same priority the kernel will stripe across these swap-devices.

If you do not need swap (i.e. you have got enough free+cached ram to take up what is currently in swap), you can deactivate swap online: swapoff -a. If swap is in use it will take a moment to load the swapped areas into ram.

Now you can safely build your raid1, mkswap on that device, edit fstab and swapon -a. If you use swap as resume-device you need to reconfigure that as well to the new device.

Best Answer

Related Solutions

Lvm – Linux Software RAID1: How to boot after (physically) removing /dev/sda? (LVM, mdadm, Grub2)

Linux – Forgot to create a second RAID for swap partitions in Software RAID 1. Will it work

Related Topic