How to fix LSI MegaRaid RAID5 after 1 disk failed

lsimegaraid

My LSI MegaRaid just told me one disk is "UBad" which I assume means it failed:

EID:Slt DID State DG     Size Intf Med SED PI SeSz Model                Sp Type 
--------------------------------------------------------------------------------
252:7    13 UBad   F 2.728 TB SATA HDD N   N  512B WDC WD30EFRX-68EUZN0 U  -

I have a hot spare installed:

EID:Slt DID State DG     Size Intf Med SED PI SeSz Model                Sp Type 
--------------------------------------------------------------------------------
252:6    14 DHS    0 2.728 TB SATA HDD N   N  512B WDC WD30EFRX-68EUZN0 D -

but the status of the hot spare didn't change. Is it being used to save my RAID array?

If not, how do I tell the controller to add the hot spare to the disk group 0?

Best Answer

First, get some information about your controller, volumes and drive:

storcli /c0 show all

/c0 is the controller to check. If unsure, try /cALL and look for a line like Controller = to get the controller numbers.

We need EID and Slot/Slt from the output. The TOPOLOGY or PD LIST are good:

TOPOLOGY :
========

---------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type  State BT     Size PDC  PI SED DS3  FSpace TR 
---------------------------------------------------------------------------
 0 -   -   -        -   RAID5 Optl  Y  8.185 TB dflt N  N   none N      N  
 0 0   -   -        -   RAID5 Optl  Y  8.185 TB dflt N  N   none N      N  
 0 0   0   252:0    10  DRIVE Onln  N  2.728 TB dflt N  N   none -      N  
 0 0   1   252:1    9   DRIVE Onln  N  2.728 TB dflt N  N   none -      N  
 0 0   2   252:2    11  DRIVE Onln  N  2.728 TB dflt N  N   none -      N  
 0 0   3   252:3    8   DRIVE Onln  N  2.728 TB dflt N  N   none -      N  
 0 -   -   252:7    13  DRIVE DHS   -  2.728 TB -    -  -   -    -      N  
 0 -   -   252:6    14  DRIVE DHS   -  2.728 TB -    -  -   -    -      N  
---------------------------------------------------------------------------
...
PD LIST :
=======
--------------------------------------------------------------------------------
EID:Slt DID State DG     Size Intf Med SED PI SeSz Model                Sp Type 
--------------------------------------------------------------------------------
252:0    10 Onln   0 2.728 TB SATA HDD N   N  512B WDC WD30EFRX-68AX9N0 U  -    
252:1     9 Onln   0 2.728 TB SATA HDD N   N  512B WDC WD30EFRX-68AX9N0 U  -    
252:2    11 Onln   0 2.728 TB SATA HDD N   N  512B WDC WD30EFRX-68EUZN0 U  -    
252:3     8 Onln   0 2.728 TB SATA HDD N   N  512B WDC WD30EFRX-68EUZN0 U  -    
252:4    12 Onln   - 2.728 TB SATA HDD N   N  512B WDC WD30EFRX-68EUZN0 U  -    
252:6    14 DHS    0 2.728 TB SATA HDD N   N  512B WDC WD30EFRX-68EUZN0 U  -    
252:7    13 DHS    0 2.728 TB SATA HDD N   N  512B WDC WD30EFRX-68EUZN0 U  -    
--------------------------------------------------------------------------------

In order to make the controller to reconsider the drive, set it to good:

storcli /c0 /e252 /s14 set good

/e252 is the enclosure (EID in the output) and /s14 is the slot (Slt in the output).

The state of the disk should now be uGood:

EID:Slt DID State DG     Size Intf Med SED PI SeSz Model                Sp Type 
--------------------------------------------------------------------------------
252:7    13 uGood  F 2.728 TB SATA HDD N   N  512B WDC WD30EFRX-68EUZN0 U  -

If the controller already knew the disk (it was installed before but for some reason, the controller thought the disk was bad), it may show up as DHS (dedicated hot spare).

To check that the disk is OK, run a self test:

smartctl -d megaraid,14  /dev/sdb -t long

14 is the DID (Disk ID) from the PD LIST (storcli /c0 show all). That's the disk that you want. /dev/sdb is the Linux device which is attached to the megaraid driver. -t long starts a long self test.

To find the Linux device, use lsscsi:

 [6:2:0:0]    disk    LSI      MR9260-8i        2.13  /dev/sdb

To check the state of the self test, use smartctl -d megaraid,14 /dev/sdb -c or smartctl -d megaraid,14 /dev/sdb -a