I have a three disk RAID5 array managed with mdadm, with an XFS filesystem on it. While I was using the machine, I noticed the machine suddenly become unresponsive (new programs wouldn't start, etc), but it was still up enough for me to open a new xterm and run dmesg. The kernel log showed a large number of SATA link timeouts. Rebooting the machine, two of the drives (one of the drives in the array along with a DVD drive) were not reported by the BIOS. The problem turned out to be the DVD drive (I had been having problems with it for a while), and disconnecting it got the hard drive visible again. Looking with smartctl
the disk in question (/dev/sdc
) seems to be fine so I don't think this is a disk failure.
The problem is I can't figure out how to reactivate my array. Looking at the partitions /dev/sda3
and /dev/sdb3
(the ones that didn't fail) using mdadm --examine
shows that they both of course think that /dev/sdc3
is bad/removed, while /dev/sdc3
thinks that it is fine. Worse, the array was being actively written to, so the event counts are different, sda3 and sdb3 having higher values. (I would be entirely willing to throw away that newly written data but I don't think that is relevant).
What is the best course of action for recovery? Running mdadm -A /dev/md2
does nothing, and mdadm --auto-detect
does not detect the array.
$ sudo mdadm --query /dev/md2
/dev/md2: is an md device which is not active
$ sudo mdadm --query /dev/sda3
/dev/sda3: device 0 in 3 device undetected raid5 /dev/md2. Use mdadm --examine for more detail.
However running mdadm --examine --scan -c none
does print the array with the correct UUID, so clearly it is finding it. Here is the relevant part of /proc/mdstat
, showing all drives as spares:
md2 : inactive sda3[0](S) sdc3[2](S) sdb3[1](S)
811868544 blocks
I find it quite curious that a single drive failure in a RAID5 has apparently led to my array being inaccessible. 🙁
What's the best course of action here?
Best Answer
If you see the array in /proc/mdstat, then the array is assembled; you need to start it:
If it doesn't start, try to re-run the command with the
-v
switch (verbose) and post the result.Once it is activated, you should be able to check its status and re-add sdc3 if needed.