Device in software RAID 10 array: clean, degraded. Ouch

mdadmraid10software-raid

I've got 4x 500GB drives in software RAID.
/dev/md0 is RAID 1 and mounted to /boot
/dev/md1 is RAID 10 and is swap
/dev/md2 is RAID 10 and is the main system and data device

I looked at mdadm this evening and noticed on md2…

State : clean, degraded
Number   Major   Minor   RaidDevice State
0       8        3        0      active sync   /dev/sda3
1       0        0        1      removed
2       8       35        2      active sync   /dev/sdc3
3       8       51        3      active sync   /dev/sdd3

Checking md0 and md1 all drives are shown as active sync and the device state as clean.

Here's the full outputs from mdadm for each device and also the output from /proc/mdstat
http://pastebin.com/VL0uYdU9

So it looks like /dev/sdb1 and /dev/sdb2 are functioning in /dev/md0 and /dev/md1 respectively.
But /dev/sdb3 has dropped out (apparently it's been removed) from /dev/md2

With RAID 10 I believe the data is ok unless I lose the other drive on the opposite side of the mirror. I am of course backing up to an external device and have verified that these are stable.

I've done some log grepping and noticed this pair of log lines…

Dec  9 04:25:37 hostname smartd[3199]: Device: /dev/sdb, 1 Currently unreadable (pending) sectors
Dec  9 04:25:37 hostname smartd[3199]: Device: /dev/sdb, 1 Offline uncorrectable sectors

Repeating every 30 minutes. It appears this has been the case for a while and it looks like the drive has failed a SMART data check.

On Jan 7th an idiot user rebooted the server, thinking it would solve a mail relay problem.
Here's the the boot from /var/log/messages… http://pastebin.com/jGVsDD54

Why do /dev/sdb1 and /dev/sdb2 appear to be functioning ok and just /dev/sdb3 failed?
Just a particular failed sector that happens to be on sdb3?

Is it worth attempting to re-add this partition to the md2 array?
Or should I bin the drive and replace with a fresh drive?

Best Answer

A SMART failure indicates that an overall drive failure is imminent (the timeframe is impossible to predict, however); replace this drive ASAP.

Related Topic