Linux – Recover RAID-5 that was already running in degraded mode (lost a second disk)

linuxmdadmraidraid5Ubuntu

this is silly, this has happened before and I figured out how to fix it and it was fine.

I'm running 4 500GB SATA drives in a RAID-5 on Ubuntu 7.10 server. One of the disks failed (actually I think it's one of the connectors in the hot-swap cage) and it's been running off of three disks while I find a replacement HDD or further diagnose the problem.

Now, before you read any further, NO I do not have backups and the information is not super important, just nice to have.

Anyway once before, I had some kind of HW hiccup, maybe the power went out or something, and I had problems recovering the array. It wasn't that one of the disks failed, it was something else.

I was able to simply add back in the second "failed" disk and in a few minutes, I was back up and running. Maybe I had to run some kind of filesystem check, I don't know.

I spent hours, if not days, figuring out how to do it last time and have since forgotten.

The crux of the issue is that if I run a mdadm –examine on sdb, sdc, and sdd, sdd thinks it's still part of the array but on the superblock info of sdb and sdc, it lists sdd as removed.

sda is the disk that failed long before, it's listed correctly in all of them as faulty removed.

TIA. The server in question is not on the internet so it's not possible to C&P the output of various commands on to the forum.

I know, by now a lot of you probably think I'm a nitwit, or worse. However I do recollect that once I figured out the series of commands to run, it was a fairly straightforward procedure and it worked great.

Best Answer

Provided the drives have not actually failed but rather become temporarily unavailable or for some other reason have come out of sync, you can try to force the raid online ignoring the change number/time stamp of each member.

By doing this you run the risk of corrupting data, especially if you don't know which drive went offline last - but it sounds like you have little choice.

Read up on the various ways to use the --force option in the mdadm man page.

If one of the drives have actually failed and another is out of sync, you can still bring the raid online supplying "missing" as the device ID for the failed drive, combined with the --force option. This should start the raid as degraded.