LSI MegaRAID SAS 9261-8i: Disk isn’t recognized after replacement

hard drivehardware-raidmegaclimegaraidraid

I've got a Supermicro Server with an LSI MegaRAID SAS 9261-8i Raid Controller inside. There were 3 Disks attached to the controller which were configured as RAID5 array. One of the disks failed recently (RAID displayed as degraded) and after checking the S.M.A.R.T information it came out that it had to be replaced.

I marked the drive as missing using storcli and removed the drive for ship-in to the vendor. Now the replacement for the disk arrived, I plugged it to the RAID controller but nothing happend. This is what storcli says:

storcli /c0 show

TOPOLOGY :
========

------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type  State BT     Size PDC  PI SED DS3  FSpace 
------------------------------------------------------------------------
 0 -   -   -        -   RAID5 Dgrd  N  5.456 TB dflt N  N   none Y      
 0 0   -   -        -   RAID5 Dgrd  N  5.456 TB dflt N  N   none Y      
 0 0   0   -        -   DRIVE Msng  -  2.728 TB -    -  -   -    -      
 0 0   1   252:5    14  DRIVE Onln  N  2.728 TB dflt N  N   none -      
 0 0   2   252:2    11  DRIVE Onln  N  2.728 TB dflt N  N   none -      
------------------------------------------------------------------------

As you can see, the both drives in Slot 2 and 5 are online and another drive of the Device Group (DG) is marked as missing. The third drive used to be in Slot 0 while the replacement ist now in Slot 1. But the new drive isn't recognized by the controller, as you also can see in the Phsical device list (output from the same command as above):

Physical Drives = 2

PD LIST :
=======

-----------------------------------------------------------------------------
EID:Slt DID State DG     Size Intf Med SED PI SeSz Model                  Sp 
-----------------------------------------------------------------------------
252:2    11 Onln   0 2.728 TB SATA HDD N   N  512B WDC WD3000FYYZ-01UL1B0 U  
252:5    14 Onln   0 2.728 TB SATA HDD N   N  512B WDC WD3000FYYZ-01UL1B0 U  
-----------------------------------------------------------------------------

In contrast to that, see the following output:

storcli /c0/pall show

PhyInfo :
=======

----------------------------------------------------------------------------
PhyNo SAS_Addr           Phy_Identifier Link_Speed Device_Type  Description 
----------------------------------------------------------------------------
    0 0x0000000000000000              0 No limit   -            -           
    1 0x4433221101000000              0 No limit   End Device   -           
    2 0x0000000000000000              0 No limit   -            -           
    3 0x0000000000000000              0 No limit   -            -           
    4 0x4433221104000000              0 No limit   End Device   -           
    5 0x0000000000000000              0 No limit   -            -           
    6 0x4433221106000000              0 No limit   End Device   -           
    7 0x0000000000000000              0 No limit   -            -           
----------------------------------------------------------------------------

I guess that PhyNo 1 is the replaced drive, but this is the only command where I can find a trace of it. All Slot specific commands for Slot 1 ends up with Drive not found.

Any ideas about that? I tested the replaced drive in a second server which is exactly the same setup (also the same RAID Controller), where the Controller detects the drive instantly marked as UGood which means Unconfigured Good, so it couldn't be a drive error. I also did some reboots, shutdown for a few minutes and tried to use the LSI MegaRaid BIOS while booting up to detect the new drive, without success. The drive doesn't show up in the LSI MegaRaid BIOS boot message.

Any hints would be much appreciated.

Best Answer

As it turns out, this behaviour was caused by a starving HDD - which was the replaced one. I didn't get it because the second server recognized the new HDD without problems, but maybe this was the last breath of this virgin harddrive.

I didn't expected a defect on arrival by data-center grade HDDs (WD RE series, before you ask), I will be aware of it in the future, before I waste hours of my time.

Related Topic