RAID-1 drive failure – will the missing data be “rebuilt”

raidwindows-server-2003

We are a small company with an old Dell PowerEdge 830 with a CERC 6ch raid controller. Server is our file server, domain controller (Windows Server 2003), MySQL server, etc. We have a sysadmin that we have worked with for a couple years that usually keeps things working well for us, but he's out of the country and unreachable right now.

Yesterday I received a call from my manager that the server had an alarm sound going off, quite loud and would not stop. No one at the office complained of errors saving files to the server or reading files. I came into the office and did some googling and determined that the alarm was related to the RAID and that there was a BIOS setting to silence it (until we can replace the bad drive). Oh yeah, I forgot to mention I could hear a mechanical failure in one of the drives. So I go into the raid configuration and find the alarm and silence it. This of course required a reboot and during reboot I could hear the poor, dead drive and also there were a few BIOS messages to the effect of "Raid SATA 0 offline or rebuilding" – (not exactly what it said, I apologize I didn't write it down)

Long story short, the server booted back up and we soon found that all the data that had been written to disks between the time the alarm went of (i.e. disk failed) and the time I rebooted was gone. I saved some files POST-reboot and they persisted across an additional reboot. But the files that were saved Sunday, Yesterday and Today up until the first reboot are gone.

This completely surprises me, RAID-1 is mirrored so why would data be missing? People in the office started grumbling about all the files they would need to recreate (ah yes, the backup is also missing the files) and I stopped them until I could figure out a bit more about all this. My question to you pros is: Is there anything that can be done to restore that data? Is there a RAID utility or process that should be followed in order to fix the problem? In other words, does what I've described thus far sound normal in a failure event and is there simply some additional steps that need to be taken to tell the raid the other disk is dead and to rely on the data that is mirrored on the remaining drive?

I'm fairly comfortable administering our server and the various services it's running, but when it comes to RAID and hardware in general I'm a total newb and considering we've got real-world data at stake I'm reluctant to start trial-n-erroring my way through the process.

Best Answer

It almost sounds like your RAID decided to boot off or rebuild using the failed drive. As one drive fails, the RAID keeps writing to the other drive, system reboots off the failed one somehow. Perhaps it is only somewhat failed.

Hopefully it actually failed out the drive and didn't try rebuilding.

In any case my first suggestion is this. Turn off the system, and disconnect one of the drives (start with the one making noises). Then boot it up and see if your data is present. If not, then try switching to the other drive so that only it is connected. You might need to boot up the system using a livecd or some sort so you inspect the contents of the drives without changing anything.

If you don't see your data on either drive, then you are most likely out of luck.

Related Topic