PowerEdge 6650 disk issues

delldell-poweredgeraidscsi

Here's a weird one I've been fighting for a while. I've got a old out-of-warranty Dell PowerEdge 6650 server with a PERC 3/DC RAID controller controlling four newer (maybe a year old) Fujitsu 136GB U320 SCSI disks in a RAID5 array.

Maybe once a month or so one of these disks will randomly "fail." By fail, that means the PERC decides that they've failed and it starts beeping and blasting alerts. All I have to do to resolve the issue is remove and reseat the "failed" disk and it starts resyncing the array. Once the resync is complete, the bezel light on the front of the machine goes back to blue from orange and the beeping stop.

My main question is what is causing these disks to "fail," when in fact they're perfectly fine. At first I thought it might be a firmware issue, so I reflashed every flashable component in the system. BIOS, PERC firmware, disk firmware, everything.

There doesn't seem to be a cause or event that triggers one of the non-failures, it just happens at random.

It's not exactly a huge issue, but it's definitely something I'd like to resolve. Dell won't provide support since the machine is out of warranty, and their website/forums are useless as always.

Best Answer

I like running old hardware as long as possible, but I'd get the machine replaced. You're going to have a tough time making any headway in resolving this issue.

My suspicion would be subtle interaction between the firmware on the "failing" drives, possibly the hot-swap backplane, and the RAID controller. No one at either Dell or Fujitsu is testing those drives what that controller anymore, and you're unlikely to get anyone at either company interested.

You're puting the array at risk each time this happens, since the array is becoming degraded and being rebuilt. If a legitimate failure happens on another disk during the rebuild process you're going to be in an array failure scenario. Hopefully you've got good backups.

It's frustrating because adding disks really should work fine, but with something this age you're really better off biting the bullet and getting something with active manufacturer support.

Related Topic