I have a Dell T7500 with a PERC H710P connected to 4 3T drives in a RAID5 array. Also connected to the controller are 2 256G SSD drives, not configured in an array. A Linux server is installed on one of the SSD drives, and the RAID5 is where all my user data are stored.
The other day upon boot, the RAID BIOS reported errors
Drives 01 and 03 missing
Foreign config available
I loaded the foreign config, and the drives reappeared. On the next boot, I got
Drive 01 offline
Thinking the drive was bad, I replaced it with a new drive and rebuilt drive 01. When I next booted, the system came up OK, but a few reboots later I got
Drive 00 offline
Foreign config available
So I read in the Foreign config and forced 00 online.
After several reboots I then got
Drive 03 offline
Foreign config available
Read in foreign config. Force drive 03 online.
Now the system comes up OK. I have rebooted it many times.
Should I assume that my controller is bad?
Or said another way, is there any possibility that this kind of behavior can be caused by something other than the controller? For example, can the kernel driver muck up the driver configuration somehow?
Best Answer
Yes, I believe either your controller or the raid backplane is bad. But I think the controller is the culprit. Can you look up the firmware version of the RAID controller (not to be confused with the system BIOS, which you should also check) and compare to what is available on Dell's site? You may find the version is quite old and that critical issues have been resolved in newer versions. Alternatively you could try calling Dell support - which you should certainly do if support is available! You can easily check what service contract is in force by looking up the Service Tag at support.dell.com.
Two notes of caution. You are in dangerous territory. Upgrading the RAID controller firmware can sometimes result in data loss - make sure the new version has been out for awhile, and read the release notes carefully. 2) RAID 5 doesn't give you a lot of wiggle room. Either way prepare to back up your critical data before you let time pass on this issue or take any substantial corrective actions!