Debian – md raid fails to boot with missing drive

debianmdmdadmraidsoftware-raid

I got a server with Debian Jessie, 4 Drives sda to sdd all of which are partitioned likewise. The system is in a raid1 md raid over all drives. All drives do have grub and I can swap discs with each other, each one is bootable and the system boots up happy. All drives do contain exactly the same format:

  sdx1 - Boot Partition, GRUB installed
  sdx2 - Raid 1 /boot
  sdx3 - Raid 1 /
  sdx4 - Raid 10 swap
  sdx5 - non-md btrfs Raid 6 /data

The data partition is raid6 btrfs, I'm currently trying to upgrade my capacity by swapping out a drive for a bigger one, since I can have two fails my first instinct was to just replace one of the drives and boot back up, restore the failed raid arrays with the newly installed drive and after the rebuild everything is back to normal.

BUT the machine (which sadly is headless currently) does not boot once I swap the drives to something that invalidates the raid array. I can swap the discs with each other all day long and it happily boots. But if I remove a disc or swap in anything that is not part of the raid it fails to boot.

Am I missing something? How can I tell md that it is ok to boot with missing discs/degraded array automatically? In the end as far as md is concerned even one of the four discs can support the whole system by itself, the data partition is another beast as it needs at least two drives but md should not be concerned with that as that is a pure btrfs raid.

I know for the current usecase I could just remove the drive from the raid, upgrade it and then put it back there, but in the event of a fail I don't have the possibility to remove the drive if the system does fail to start up.

Best Answer

As an update and the answer - in the meantime I figured out that the only thing really missing here was the nofail flag in fstab. The filesystem was degraded and it would not mount the filesystem in a degraded state without the nofail option beeing set.