Linux – the procedure for reactivating a failed mdadm RAID5 array

linuxmdadmraidraid5software-raid

I have a three disk RAID5 array managed with mdadm, with an XFS filesystem on it. While I was using the machine, I noticed the machine suddenly become unresponsive (new programs wouldn't start, etc), but it was still up enough for me to open a new xterm and run dmesg. The kernel log showed a large number of SATA link timeouts. Rebooting the machine, two of the drives (one of the drives in the array along with a DVD drive) were not reported by the BIOS. The problem turned out to be the DVD drive (I had been having problems with it for a while), and disconnecting it got the hard drive visible again. Looking with smartctl the disk in question (/dev/sdc) seems to be fine so I don't think this is a disk failure.

The problem is I can't figure out how to reactivate my array. Looking at the partitions /dev/sda3 and /dev/sdb3 (the ones that didn't fail) using mdadm --examine shows that they both of course think that /dev/sdc3 is bad/removed, while /dev/sdc3 thinks that it is fine. Worse, the array was being actively written to, so the event counts are different, sda3 and sdb3 having higher values. (I would be entirely willing to throw away that newly written data but I don't think that is relevant).

What is the best course of action for recovery? Running mdadm -A /dev/md2 does nothing, and mdadm --auto-detect does not detect the array.

$ sudo mdadm --query  /dev/md2
/dev/md2: is an md device which is not active
$ sudo mdadm --query  /dev/sda3
/dev/sda3: device 0 in 3 device undetected raid5 /dev/md2.  Use mdadm --examine for more detail.

However running mdadm --examine --scan -c none does print the array with the correct UUID, so clearly it is finding it. Here is the relevant part of /proc/mdstat, showing all drives as spares:

md2 : inactive sda3[0](S) sdc3[2](S) sdb3[1](S)
      811868544 blocks

I find it quite curious that a single drive failure in a RAID5 has apparently led to my array being inaccessible. 🙁

What's the best course of action here?

Best Answer

If you see the array in /proc/mdstat, then the array is assembled; you need to start it:

sudo mdadm -R /dev/md2

If it doesn't start, try to re-run the command with the -v switch (verbose) and post the result.

Once it is activated, you should be able to check its status and re-add sdc3 if needed.