How to make RAID controller rescan devices

jbodmultipathraidzfs

I have the following setup:

A single server with two LSI MegaRAID SAS 9380-8e controllers which are both connected to two 60-bay disk shelves while roughly following the design by Edmund White (see https://github.com/ewwhite/zfs-ha/wiki). The goal is to replicate the exact setup, but it's currently mid-migration.

After wiring the first shelf, all 60 disks were seen by both controllers and multipathing was setup and works smoothly. When adding the second disk shelf, there was still some old RAID configuration on the 60 disks which was dutifully reported by both controllers. Using the first controller I removed the configuration from disks and set them to being JBOD. All 60 disks are now visible to the OS and could be registered with multipath but only report a single path (going through controller 1), the second controller still reports all 60 disks as foreign (UGood F) and there is seemingly no way to forcibly make the controller rescan the devices or forget the current config for just this shelf:

# /opt/MegaRAID/storcli/storcli64 /c1 /e71 /sall show | head -n20
Controller = 1
Status = Success
Description = Show Drive Information Succeeded.


Drive Information :
=================

-----------------------------------------------------------------------
EID:Slt DID State DG     Size Intf Med SED PI SeSz Model            Sp 
-----------------------------------------------------------------------
71:0     74 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:1    107 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:2     72 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:3     95 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:4     90 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:5     77 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:6     73 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:7     76 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
71:8     83 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  

This is the same shelf as seen by the other controller:

# /opt/MegaRAID/storcli/storcli64 /c0 /e165 /sall show | head -n20
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.


Drive Information :
=================

-----------------------------------------------------------------------
EID:Slt DID State DG     Size Intf Med SED PI SeSz Model            Sp 
-----------------------------------------------------------------------
165:0   127 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:1   121 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:2   118 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:3   116 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:4   146 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:5   122 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:6   115 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:7   142 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
165:8   145 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  

But trying to clear the (wrong) info from the second controller does not work:

# /opt/MegaRAID/storcli/storcli64 /c1 /fall show
Controller = 1
Status = Success
Description = Couldn't find any foreign Configuration

# /opt/MegaRAID/storcli/storcli64 /c1 /fall delete
Controller = 1
Status = Success
Description = Couldn't find any foreign Configuration

# /opt/MegaRAID/storcli/storcli64 /c1 /fall import
Controller = 1
Status = Success
Description = Couldn't find any foreign Configuration

Forcing the disks into JBOD on the second controller does not work either:

# /opt/MegaRAID/storcli/storcli64 /c1 /e71 /sall set jbod | head -n20
Controller = 1
Status = Failure
Description = Set Drive JBOD Failed.

Detailed Status :
===============

-------------------------------------------------
Drive       Status  ErrCd ErrMsg                 
-------------------------------------------------
/c1/e71/s0  Failure   255 Operation not allowed. 
/c1/e71/s1  Failure   255 Operation not allowed. 
/c1/e71/s2  Failure   255 Operation not allowed. 
/c1/e71/s3  Failure   255 Operation not allowed. 
/c1/e71/s4  Failure   255 Operation not allowed. 
/c1/e71/s5  Failure   255 Operation not allowed. 
/c1/e71/s6  Failure   255 Operation not allowed. 
/c1/e71/s7  Failure   255 Operation not allowed. 
/c1/e71/s8  Failure   255 Operation not allowed. 
/c1/e71/s9  Failure   255 Operation not allowed. 

Is there any way to tell the RAID controller those disks do no longer have a foreign config and should be seen as JBODs?

Best Answer

Restart the out-of-sync controller (eg c1)

/opt/MegaRAID/storcli/storcli64 /c1 restart