This is less of an actual solution to my problem, and more of a realization that my expectations of "md+lvm" software RAID were perhaps more than realistic. I was perhaps asking too much to expect "md+lvm" to boot with a hard disk suddenly and mysteriously vanished. When a drive (or a partition of a drive) in a RAID configuration actually becomes faulty during use, it will over time generate various errors in log files, etc, and the 'md' RAID software will experience failures in trying to use that drive and/or partition.
Eventually the 'md' software will 'fail' that component, or mark it as faulty. You can see this via "cat /proc/mdstat", in this case showing that the /dev/sdb1 component of the md125 RAID1 mirror is faulty:
[root@host ~]# cat /proc/mdstat
Personalities : [raid1]
md125 : active raid1 sda1[0] sdb1[1](F)
1049536 blocks super 1.0 [2/1] [U_]
bitmap: 0/1 pages [0KB], 65536KB chunk
. . . . .
[root@host ~]#
Using the "mdadm" administrative utility, you can simulate such a failure via:
mdadm --manage /dev/md125 --fail /dev/sdb1
(That's how I produced the above output.) The command:
mdadm --manage /dev/md125 --remove /dev/sdb1
then removes the failed component from the metadevice configuration, and the metadevice runs on the remaining components. When a partition or drive really does fail, then before you can pull the drive it is necessary to 'fail' and 'remove' every partition of that drive from the metadevices of which they are components. After I simulated a drive failure and response by doing all this, I was able to successfully shut down, pull the drive, reboot the unit and it came back up into Centos successfully. All the metadevices (RAID1 mirrors) ran on single sub-mirrors:
[root@reports2 ~]# cat /proc/mdstat
Personalities : [raid1]
md125 : active raid1 sda1[0]
1049536 blocks super 1.0 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk
md126 : active raid1 sda2[0]
1047552 blocks super 1.2 [2/1] [U_]
bitmap: 0/1 pages [0KB], 65536KB chunk
md127 : active raid1 sda3[0]
974529536 blocks super 1.2 [2/1] [U_]
bitmap: 3/8 pages [12KB], 65536KB chunk
unused devices:
[root@reports2 ~]#
I did not really have a bad drive, so I just shut down, put the drive back in, rebooted, and added the /dev/sdb components back into their respective metadevices via commands like:
mdadm --manage /dev/md125 --add /dev/sdb1
After doing this with all the mirrors, they will re-sync. The larger the mirror the longer it takes:
[root@reports2 ~]# !cat
cat /proc/mdstat
Personalities : [raid1]
md125 : active raid1 sdb1[1] sda1[0]
1049536 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md126 : active raid1 sdb2[2] sda2[0]
1047552 blocks super 1.2 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md127 : active raid1 sdb3[2] sda3[0]
974529536 blocks super 1.2 [2/1] [U_]
[>....................] recovery = 0.0% (883968/974529536) finish=91.7min speed=176793K/sec
bitmap: 3/8 pages [12KB], 65536KB chunk
unused devices:
[root@reports2 ~]#
With a real drive failure, it is of course necessary to first partition the replacement drive identically as the remaining one. Though I did not really need to, I followed the instructions at:
https://www.howtoforge.com/tutorial/linux-raid-replace-failed-harddisk/
And used the "gdisk" utility to conveniently duplicate the /dev/sda GPT partition table onto /dev/sdb via:
sgdisk -R /dev/sdb /dev/sda
sgdisk -G /dev/sdb
The above link recommends "gdisk" as reliable for GPT disklabels/partition-tables. The first command does the actual copy (from "/dev/sda" to "/dev/sdb", perhaps a little counter-intuitively) and the second generates unique UUIDs for /dev/sdb and it's partitions.
Best Answer
You don't. As you have realised, if you store the password/decryption key with the server, it's accessible should you be rooted or otherwise compromised. If you don't, reboots are non-trivial.
Why do you want to encrypt the entire of
/
anyway? What good does encrypting the OS itself do (I mean, even if you manage this, it'll lower your boot time since it has to decrypt before using them).That is, why not just encrypt
/home
just most distributions offer? Then when you login/home
is decrypted and mounted.