Proliant RAID 1 Rebuild Questions

data-recoveryhardware-raidhp-proliantraidraid1

I have a HP Proliant ML350 G5 server that experienced a power supply failure overnight. The power supply was replaced but unfortunately it got restarted with only 1 disk in the RAID 1 set plugged in. (The raid controller is the build in E200i).

The raid BIOS then said on start-up that it had entered Interim Recovery Mode. However I would have expected it to still start up with only the 1 drive. The bios however says that it cannot find a C: drive and enters a reboot loop polling the other boot devices. First question is, is this normal behaviour not to start up on 1 disk?

The second drive was then plugged in (all drives are ok) and the raid bios started an automatic rebuild on that disk. This appears to be a background process as there is no progress shown. However based on the light flashing it looks like it is working. My second question is how long will this rebuild take? (36GB 15K SAS drive).

I cannot see any error messages and it looks like it is rebuilding the drive ok, but the computer still will not start-up. It still says during the boot up process that the C: drive is not found. If I wait for the rebuild to finish, is it likely to fix itself and find the C: drive? Or is there some other problem here?

Answers

These are the conclusions I made after solving this issue.

1) No it is not normal. On our system (as most others), if one of the RAID 1 disks is missing or in the process of being rebuild, the single remaining disk should still operate fine and boot up correctly. (Although the controller does drop into a reduced performance mode.)

2) The RAID 1 rebuild on our system took about 4.5 hours to reconstruct the disk after it was put back in. Seemed like a long time to me for a RAID 1+0 36GB 15k rpm SAS drive that wasn't being used at the time. But that's what it took. (As an experiment, I pulled and replaced a 10k rpm 146GB SAS drive from this machine's companion RAID 5 array which uses 4 disks. It took less than 2 hours. Go figure.)

3) The fundamental problem I was having with this machine turned out to be a corruption in the machine's NVRAM. I can only assume the power supply fault was responsible for corrupting it. Although there was no obvious signs in the BIOS as anything being wrong. All the settings looked as they should be. However after clearing the NVRAM via the S6 switch on the motherboard, the system booted without problem. I guess the referenced boot controller had somehow changed in some underlying BIOS setting. (Incidentally if you do this, don't forget to reset the date and time before letting your server get carried away with receiving mail and missing backups.)

Best Answer

You've got something funky going on there, though I'm not exactly sure what it is.

The server should boot and operate normally with just 1 drive in it. All that should happen is the controller marks the array as degraded, but Operating Systems don't care (or even know) about this condition and should carry on as normal.

With regards to the rebuild, ordinarily I'd say look at the HP Array Diag Utility as that will give you some indication of rebuild progress. Since the Operating System sounds hosed at this point, the BIOS may have some rudimentary was of configuring arrays and displaying their status. Failing that, you should be able to boot off of a StartSmart CD which contains the HP Array Diag Utility. A 36GB drive should rebuild relatively quickly - I've seen a 36GB RAID1 on an ML370 rebuild in a morning.

Is it definitely the BIOS telling you drive C: isn't found? C: is a very Windows thing, and I'd be surprised that a BIOS would reference a very Windows-centric thing like that when other Operating Systems can be installed (it may well do, it just strikes me as odd).