Recommended approach for booting from Linux software RAID for redundancy

centos8mdadmraid1software-raid

I am looking for the recommended approach to setting up software RAID (using mdadm) on Centos 8 to provide redundancy.

In my setup I have 4 identical disks, but I assume the answer should also be relevant to 2 disk setups.

The primary requirement here is redundancy and no requirement for manual intervention for the system to boot if any one disk fails.

In short – I am looking for the simplest, most reliable solution.

Questions:

  1. What type of partition table is recommended (assuming Linux only, no dual boot)
  2. UEFI boot vs. BIOS boot (UEFI seems unnecessarily complex and there appears to be many issues with it still)
  3. fdisk/mdadm/lvm commands to setup both a /boot and a LVM volume that will contain the root and swap partitons
  4. Can this all be done from the anaconda installation GUI?

I have had some problems with EFI/GPT systems booting so I'm not convinced of the advantage (it appears that additional partitions are required when using GPT some of which it doesn't seem possible to use software RAID). So also looking for reasons to go down the EFI/GPT route, if this is indeed the recommended approach.

Best Answer

I ended up going with a GPT partition table, but sticking with the old Bios boot.

This was in-part due to some of the complexities surrounding using RAID for an EFI boot partition - https://unix.stackexchange.com/questions/265368/why-is-uefi-firmware-unable-to-access-a-software-raid-1-boot-efi-partition and partly because I was recovering from an unbootable system that used EFI (still not sure why, but after trying to fix for a day it was just easier to rebuild. This is exactly what I don't want)

Anyway it appears it is possible to install and set this up this via the Centos8 anaconda installer. To do this do the following:

  1. Set BIOS to boot your installation media (and HDDs) using legacy boot
  2. Boot the installer
  3. Manually partition your HDDs - I used the following:
    1. 1 MiB Biosboot partition configured in raid 1
    2. 1 GiB /boot partition configured in raid 1
    3. Rest of the HDD space configured in a LVM raid (10 in my case) array with a / and swap partition (and whatever else you want)

I have tested this by pulling each HDD making sure the server could boot and then re-adding the "failed" drive back into the md arrays. All works as expected.

Below is the relevant parts for a kickstart config that was built from my install:

#version=RHEL8
ignoredisk --only-use=sda,sdb,sdc,sdd
# Partition clearing information
clearpart --all --initlabel --drives=sda,sdb,sdc,sdd

# Disk partitioning information
part raid.903 --fstype="mdmember" --ondisk=sda --size=2
part raid.1340 --fstype="mdmember" --ondisk=sdd --size=1025
part raid.2107 --fstype="mdmember" --ondisk=sdc --size=475912
part raid.910 --fstype="mdmember" --ondisk=sdb --size=2
part raid.1319 --fstype="mdmember" --ondisk=sda --size=1025
part raid.2114 --fstype="mdmember" --ondisk=sdd --size=475912
part raid.2100 --fstype="mdmember" --ondisk=sdb --size=475912
part raid.2093 --fstype="mdmember" --ondisk=sda --size=475912
part raid.1326 --fstype="mdmember" --ondisk=sdb --size=1025
part raid.1333 --fstype="mdmember" --ondisk=sdc --size=1025
part raid.917 --fstype="mdmember" --ondisk=sdc --size=2
part raid.924 --fstype="mdmember" --ondisk=sdd --size=2
raid biosboot --device=biosboot --fstype="biosboot" --level=RAID1 raid.903 raid.910 raid.917 raid.924
raid pv.2121 --device=pv00 --fstype="lvmpv" --level=RAID10 --chunksize=512 raid.2093 raid.2100 raid.2107 raid.2114
raid /boot --device=boot --fstype="ext4" --level=RAID1 raid.1319 raid.1326 raid.1333 raid.1340
volgroup cl_host01 --pesize=4096 pv.2121
logvol / --fstype="ext4" --size=51200 --name=root --vgname=cl_host01
logvol swap --fstype="swap" --size=4096 --name=swap --vgname=cl_host01