Windows – Can someone please explain this RAID Error message

hardware-raidraidwindows

I have a 8 drive RAID 6 (500×8) on a production server at work.

Yesterday we noticed the server was slow, and after an investigation I found that 2 of the drives in the RAID had failed, and the notification system was not able to send out a email notification. We immediately shut down the server, replaced the 2 drives that had failed, and let them rebuild before rebooting in to windows.

Now the boot back to windows went find, but at times some errors will pop up like this:

Puncturing bad block:   PD   Port 0 - 3:0:0      Location   0x209a3686
Puncturing bad block:   PD   Port 4 - 7:0:7      Location   0x209a3686
Unrecoverable medium error during recovery:   PD   Port 0 - 3:0:0      Location   0x209a3686
Puncturing bad block:   PD   Port 0 - 3:0:0      Location   0x209a3686
Puncturing bad block:   PD   Port 4 - 7:0:7      Location   0x209a3686

In addition, VMs running on the machine seem to be unable to complete windows updates properly. This may or may not be related.

After investigating a little, I ran a consistency check on the VD, and there are several things that came up, intermixed with more of the same like the above:

Consistency Check completed with uncorrectable errors on VD:   0
Consistency Check found inconsistent parity on VD     strip:       ( VD   =   0,   strip       =   1068315)
Consistency Check detected uncorrectable multiple medium     errors:       ( PD   Port 4 - 7:0:7  Location   0x209a3686  VD       0)
Consistency Check found inconsistent parity on VD     strip:       ( VD   =   0,   strip       =   1067493)
Unexpected sense:   PD       =   Port 0 - 3:0:0Unrecovered read error,   CDB   =    0x28 0x00 0x1f 0xac 0x8c 0x00 0x00 0x02 0x00 0x00    ,   Sense   =    0xf0 0x00 0x03 0x1f 0xac 0x8d 0xdb 0x0a 0x00 0x00 0x00 0x00 0x11 0x00 0x00 0x00 0x00 0x00

I read on one forum where the numbers like 3:0:0 show its drive 0 where the problem is, tho I am not sure since there are multiple ones shown here. (We replaced drives 1 and 5).

Can someone break this down for me? Is there a simple fix, like further replacing and rebuilding other drives?

Thanks in advance

Best Answer

Usually when there are punctured drives, the RAID data is non trustable any-more. You could try to copy it somewhere else but integrity of data cannot be trusted.

After, kill the whole RAID structure, create a new one with initialisation option to force sanity check of disk.

Best is to actively monitor the RAID state, in order to detect issue at first disk failure and not wait until both have failed.

Time to restore from your backups.