RAID can’t rebuild Rebuild offen missing HDD

hardware-raidraidraid10

First i would likes to say the raid configurations of my server. Intel RAID 10(4*1TB HDD) two from wd, 1 from hitachi, 1 from seagate + 1 seagate as hot spare
Here goes the problem the last hdd seagate was failed and even there was a hot spare it was't replaced automatically then i found it and replaced the hot spare in the place of failed HDD manually. Then turning it on i found it was rebuilding. Later it gave a error message "Rebuild failed due to target drive error" then i rebooted the server i cant find that replaced hdd in the raid list due to some work i left it. While i turning on today i got this error message from BIOS and that hdd was also added in this list the log was "If you believe these PDs do not contain a desired config., pls. power off the system, remove these PDs and reboot." and the time stamps were only 01,02,03.
Sorry for the language problem.

Best Answer

  1. If your file systems report they're intact, and data is not corrupted, you're lucky. Make a full backup NOW! This is always you should do first when you see your storage system is messed up, i.e. behaves strangely.

  2. I didn't get completely, which RAID technology you use. Intel does not make RAID chips, they offer either rebranded LSI MegaRAID or "Intel Matrix Storage Manager". The former is good and strange to have problems you described. The latter is fake-RAID, which are known to be quite unstable and unreliable. I could remember only few cases where it really survived rebuilds, and many cases where IMSM RAID helped us to save data, but given much more headache.

In latter case I suggest you not to repair current setup, but to migrate from IMSM to true hardware RAID or to completely software-defined array. Windows has this ability when drives are converted to dynamic ones, and Linux software RAID is well-known for its flexibility and reliability.

  1. If you don't want to migrate from fake-RAID, I suggest you to try to completely erase metadata from spare drives and then use them as spares again. This should make every part of system to forget those drives were here. You have to eject spare, connect to other computer and fill it with zeroes there (in Linux, I use dd if=/dev/zero of=/dev/sdX; can't suggest any solution for Windows, google for that), then it could be tried in array again.

  2. Ignore claims about not-exactly-same drives. All redundant array MTBF calculations assume drives will die independently, occasionally. If you use same drives from same vendor under same load (as it is often seen in in new servers), they will have same manufacturing traits and defects and the causes of the drive failures will be same. So if one of them failed, you should expect others to fail shortly, i.e. not independent. Usual array reliability assumptions are completely wrong if you use similar drives! I've seen some systems where spare was kicked in, but some other drive died during the rebuild process, so array only made data retrieval harder, just because someone installed all-same drives!

However, if you knowingly use different drives, that drives could be assumed to not to have same traits and defects. They will fail truly independently. Well-known array reliability calculations and expectations will be correct only in this case! So if you want true redundancy, and not just glamour picture of shelves with thosands of same drives, you will end up always using different drives. And kill everyone with fire who suggest you to use "same drives from same manufacturer and same series".