LSI MegaRAID : what does “transient error detected while communicating with PD : -:-:1” mean

lsimegaraid

I've got a LSI MegaRAID 9260-16i card running in a server, and it keeps logging the error

Controller ID: 0 Transient error detected while communicating with PD: -:-:1

I can't find anything about this message anywhere (documentation, google, forums etc.). What does this message mean?

Best Answer

Apparently this error was due to the type of disks used. LSI responded to my support ticket with the following:

the SAMSUNG HD103UJ has not been qualified as a compatible hard drive. The error and subsequent time-out event is caused by a communication issue due to the error reporting mechanism used by desktop-level hard drives, which are not intended for RAID functionality.

I was not aware that this was an issue, but after having tested things more I belive this indeed must be the root of the issue. I've changed backplanes and SAS cables with no success, and I've carried out "stress" tests on both the OS virtual disk (using enterprise Dell disks) and the DATA disk (using desktop Samsung disks) and only when running the "stress" test on the DATA disks did i receive these errors.

So, I assume there's no other way around this issue than actually buying enterprise disks such as e.g. the "Western DigitalĀ® RE Enterprise 2TB" which is supported by LSI. So much for trying to reuse hardware.

UPDATE (March 11, 2013)

The controller runs with 2 arrays, a RAID1 using WD enterprise disks and a RAID6 using SAMSUNG desktop disks. This weekend the RAID1 array degraded. The log was flooded with the error message provided in my original post. The weird thing is that the RAID1 array use enterprise disks. Could it really be that there is an issue with one of the SAMSUNG disks on the other array, and then one of the WD disks gets evicted on the other array? That seems like an odd behaviour to me.

UPDATE (May 29, 2015)

It's been a while since I dealt with this issue. I believe the actual cause was linked to the power supply. I connected all 4 backplanes to the same power connector (using splitters). At peaks (in power consumption), disk would "fall out" as enough power could not be delivered. I fixed this by simply splitting two power connectors on two backplanes each.