Linux – How to fix the server to boot properly after removing / adding a drive on the mdadm software RAID

bootlinuxmdadmraidUbuntu

I configured a RAID-5 array for a data partition mounted on /mnt/data. My system is not booting on it (/ and /boot is on a dedicated drive not part of any RAID array).

I added a 4th drive to my 3 disks RAID-5 software array on Ubuntu 12.04 via mdadm (software raid).
My RAID array contained /dev/sdb1, dev/sdc1 and /dev/sdd1.

I used that command to add a 4th drive :

mdadm --add /dev/md0 /dev/sde

Then, I upgraded the array to transform it into a RAID-6 raid using that :

mdadm --grow /dev/md0 --raid-devices 4 --level 6 --backup-file=backup/raid-backup-file

It work great. The server worked and there was no issue on boot.
The only problem is that I saw that I added the 4th drive as the whole drive. I should have add /dev/sde1 instead of sde !

In order to clean that up, I remove the drive and add it up again but with the partition this time :

mdadm /dev/md0 --fail /dev/sde --remove --/dev/sde
mdadm --add /dev/md0 /dev/sde1

After migration, it worked (array was accessible), except that, at next boot, I got a message saying the RAID array is degraded because /dev/sde was missing (all my drive was marked as spare) and entered into initramfs prompt. After exiting that shell, server continued booting without mounting the data partition from my RAID array.

After login in, I was able to stop the RAID array and ressamble it via :

mdadm --stop /dev/md0
mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 

and then mount the file system. All data were there and RAID array was clean.

However, system didn’t boot properly anymore.
My last try was to remove the last drive and change the RAID-6 array to a RAID-5 again via :

mdadm /dev/md0 --remove /dev/sde1
mdadm --grow /dev/md0 --raid-devices 3 --level 5 --backup-file=backup/raid-backup-file

But that didn’t fix the problem. At boot, system says the array is degraded and that is still missing drive sde.

After exiting initramfs shell, login in and reassembling the array like before, array is clean, see

cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid5 sdb1[0] sdd1[2] sdc1[1]
      3907026816 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

I also found, in my kernel messages, after drive discovery (sda, sdb, sdc, sde) and network card discovery, that strange line :

md: bind<sde>

Where is that coming from and how could I change that ?

My RAID array should NOT contain any reference at all to /dev/sde.

I didn’t update initramfs at first change in my array, I tried it after but that didn’t change anything.

By the way, here’s my /etc/mdadm.conf :

DEVICE partitions

CREATE owner=root group=disk mode=0660 auto=yes

HOMEHOST <system>

MAILADDR root

ARRAY /dev/md0 metadata=0.90 UUID=4d84e24c:e40f825f:3ba42e3c:267295e2

Best Answer

There must be some md metadata remaining on the sde device.

  • Remove sde1 from the raid device.
  • Wipe sde md metadata completely (using dd and depending on the version of metadata and their location on the disk) something like:

    dd if=/dev/zero of=/dev/sde bs=4096 count=1 seek=1

    or even better:

    mdadm --misc --zero-superblock /dev/sde

  • Re-create your sde1 and add it again to the md0 device and update your mdadm.conf if you still want it to contain UUIDs (you can also restore the previous metadat if you first backed them up)