ProLiant DL180 G6 with Smart Array P410 failed logical drive (keeps failing and needing rebuild)

hphp-proliantraid

I have an issue with a bunch of DL180 each with P410 smart arrays with 2 logical drives, one is for the root filesystem, and the other is a large-ish 10TB filesystem that is exported by nfs.

The boxes are primarily nfs servers, and are frequently maxed out and are the bottleneck in the processing chain.

Every so often one of these 10TB logical drives fails and needs to be rebuilt. this happens about once a month, and it a pain.

The message is " Message: This logical drive has failed and cannot be used. All data on this logical drive has been lost."

We have tried updating the firmware on the disk array, and the kernel module, and various flavours of linux have been used for the host OS, debian, CentOS, and xfs and ext3 have been tried as filesystem types. However the logical drives still regularly need rebuilding from backups.

I have attached a hpacucli diagnostic output for one of the failed drives. http://pastebin.com/9zTiuSAN

some interesting output items;

Smart Array P410 in slot 1 : Identify Controller
RAM Firmware Revision 2.00
ROM Firmware Revision 2.00

Any suggestions on what might be the problem, or how I might go about instrumenting these arrays/disks to get an idea of what is causing the drive to fail?

# cat output.txt  | grep -B 2 'Drive Firmware Rev'
   Drive Model                          ATA     GB1000EAMYC     
   Drive Serial Number                  WMATV2509266        
   Drive Firmware Revision              HPG2    
--
   Drive Model                          ATA     GB1000EAMYC     
   Drive Serial Number                  WMATV1739564        
   Drive Firmware Revision              HPG2    
--
   Drive Model                          ATA     GB1000EAFJL     
   Drive Serial Number                  9QJ456MN            
   Drive Firmware Revision              HPG8    
--
   Drive Model                          ATA     GB1000EAFJL     
   Drive Serial Number                  9QJ45RS3            
   Drive Firmware Revision              HPG8    
--
   Drive Model                          ATA     GB1000EAFJL     
   Drive Serial Number                  9QJ460P0            
   Drive Firmware Revision              HPG8    
--
   Drive Model                          ATA     GB1000EAFJL     
   Drive Serial Number                  9QJ454YN            
   Drive Firmware Revision              HPG8    
--
   Drive Model                          ATA     GB1000EAFJL     
   Drive Serial Number                  9QJ4664M            
   Drive Firmware Revision              HPG8    
--
   Drive Model                          ATA     GB1000EAFJL     
   Drive Serial Number                  9QJ457M9            
   Drive Firmware Revision              HPG8    
--
   Drive Model                          ATA     GB1000EAFJL     
   Drive Serial Number                  9QJ46Q9E            
   Drive Firmware Revision              HPG8    
--
   Drive Model                          ATA     GB1000EAFJL     
   Drive Serial Number                  9QJ4630X            
   Drive Firmware Revision              HPG8    
--
   Drive Model                          ATA     GB1000EAFJL     
   Drive Serial Number                  9QJ454PD            
   Drive Firmware Revision              HPG8    
--
   Drive Model                          ATA     GB1000EAFJL     
   Drive Serial Number                  9QJ45Z0Y            
   Drive Firmware Revision              HPG8    
--
   Drive Model                          HP      DF0146B8052     
   Drive Serial Number                  3QN1KS7H00009949SQ4M
   Drive Firmware Revision              HPD5
--
   Drive Model                          HP      DF0146B8052     
   Drive Serial Number                  3QN1KNFS00009949UX4F
   Drive Firmware Revision              HPD5

Best Answer

We had a similar issue with drives failing and an HP KB article indicated that the drive firmware was an issue. Updating the firmware is supposed to address this issue. Was unable to open your post to see if it listed driver firmware versions.