Linux – Full Recovery of Intel Matrix Software-RAID 1

fakeraidfedoralinuxraid

We have a server running Fedora 8 and Intel's Software RAID in a RAID 1 setup.

One of the disks failed on our server, so I replaced the bad drive and did a "dd if=/dev/sda of=/dev/sdb" to copy the good drive's data to the new drive.

After a reboot I was back up and running; no complaints from Fedora at all.

However, upon bootup the Intel Matrix BIOS still says that the RAID array is in a "Rebuild" state. I can't see anything wrong with the RAID array from within the OS, and the Intel BIOS-based tools don't have any options to rebuild the RAID array.

RAID Array Details

$ pvscan && vgscan && lvscan

PV /dev/dm-2   VG VolGroup00   lvm2 [465.53 GB / 32.00 MB free]
Total: 1 [465.53 GB] / in use: 1 [465.53 GB] / in no VG: 0 [0   ]

Reading all physical volumes.  This may take a while...
Found volume group "VolGroup00" using metadata type lvm2

ACTIVE            '/dev/VolGroup00/LogVol00' [463.56 GB] inherit
ACTIVE            '/dev/VolGroup00/LogVol01' [1.94 GB] inherit

fdisk -l:

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          25      200781   83  Linux
/dev/sda2              26       60800   488175187+  8e  Linux LVM

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          25      200781   83  Linux
/dev/sdb2              26       60800   488175187+  8e  Linux LVM

     Device Boot      Start         End      Blocks   Id  System
/dev/dm-0p1   *           1          25      200781   83  Linux
/dev/dm-0p2              26       60800   488175187+  8e  Linux LVM

I originally tried the lvm tools when attempting to rebuild the array, but they didn't work for me as I don't have any /dev/md* partitions. Dmraid was no help either, so I fell back to the low-level approach and used dd instead.

I'm wondering if my low-level approach is the reason that the RAID array isn't seen by the Intel BIOS as being properly rebuilt.

Updates:

  • Yes, I do have an Amazon S3 backup of the important files on the server.

Best Answer

The Intel Raid is managed by the mainboard and/or driver. lvm tools dont even get to see the things.

Your Linux seeing sda and sdb means it saw through the raid setup of the mainboard, which is a bad thing (tm).

There´s several levels in a raid: 1) the hardware 2) what the raidcontroller makes of it 3) what the OS sees. In any reliable raid system, 2 and 3 are the same. If they arent the same, questions like yours arise, confusing even the most seasoned admins. In this case, it looks like you got lucky. You did the wrong thing, your raid setup ignored you, and now is doing (hopefully) the right thing.

This is not always the case. Equal chance is, you do the right thing, the mainboard raid ignores you, and does the wrong thing.

The only way to securely repair any kind of raid is through the tools of the raid system.

What the Intel driver is now doing, is a dd, calling it rebuild. Of course, it didnt see what your dd did! It doesnt have an idea where the data output of dd comes from, and cant now that it is, in fact, the correct data. So it has to do the copying itself. For all the poor thing knows, it could be grandma´s collection of turkey recipes.

For any good solid proper raid setup, things have to be deterministic. Mainboard raids usually arent (BIOS version, driver version, OS, etc). The admin has to train him/herself to repair the raids. If you put any kind of important data on a raid, you must work yourself through some of the failures of it. If you dont, you´d probably be better off without a raid. Turns out, most of the time, only OS software raids, or raid card raids are deterministic. The mixup of mainboard/driver raid that almost each board has is not much more than a placebo.

P.S. do you have a backup?