Mdadm failure detected on one partition

drive-failuremdadmraid

I received this rather nice email today suggesting one of the drives in a RAID1 array has failed.

A Fail event had been detected on md device /dev/md4.

It could be related to component device /dev/sdc2.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1]  md4 : active raid1 sdd2[1] sdc2[2](F)
      87667136 blocks [2/1] [_U]
       md3 : active raid1 sdd1[1] sdc1[0]
      250304 blocks [2/2] [UU]

The strange thing is that sdc2 is just one partition and the other partition has not failed.

Since the server in another country, I can't physically inspect it. Any suggestions as to how to test whether this is really a failure or a glitch?

Best Answer

If your hard is smart-enabled: smartctl -t long /dev/sdc

and after one or two hours: smartctl -a /dev/sdc

and have a look if it reports errors.

Related Solutions

Software MDADM RAID 5 – Inactive MD0 showing

Looks like your drives are all being reported as [S]pares. You should check your logs (dmesg, /var/log/messages) to see if there's any indication why this happened.

Try running the following

sudo mdadm --examine --scan --config=/etc/mdadm/mdadm.conf

And see the output. If it outputs something like this:

ARRAY /dev/md0 level=raid5 metadata=1 num-devices=3 UUID=22f22c3599:613d5231:d407a655:bdeb84 name=backup:1

Then you can append it to the bottom of the mdadm.conf:

sudo mdadm --examine --scan --config=/etc/mdadm/mdadm.conf >> /etc/mdadm/mdadm.conf

Then try starting the array:

sudo mdadm -A /dev/md0

Good luck.

Linux – mdadm raid5 failure. set wrong drive to faulty by accident

i tried to re-create the raid with --assume-clean but this did not work.

This is what you should do. What do you mean by "did not work"? What's the message? What happened? Did you called mdadm with ALL of the original RAID array partitions?

what could i do to recover my data?

Restore from backup. If you have no backup, that's a well-deserved lesson (this is harsh and not intended to be a joke at all).

edit : given that this is an encrypted volume, you have absolutely zero chance to restore any data if you can't get the RAID correctly working. Can you post just the /proc/mdstat content? I don't understand what's the current state (your message states 2 failed drives, but only 1 failed drive is represented).

Best Answer

Related Solutions

Software MDADM RAID 5 – Inactive MD0 showing

Linux – mdadm raid5 failure. set wrong drive to faulty by accident

Related Topic