Rebuilding array on 3ware 9690SA-8I

3warehardware-raidraidraid10

TL;DR version

  1. RAID10 array working fine
  2. Reboot server as part of maintenance
  3. Array inoperable (no access whatsoever)
  4. Controller logs say a single drive is bad
  5. Remove drive & test – no bad sectors found
  6. Err on the side of caution, replace drive with known good one
  7. Controller won't rebuild array onto new drive
  8. Even with just 1 drive failure, controller has made entire RAID10 array inaccessible

And now the long, detailed version:

I have a RAID10 (8x1TB) array on a 3ware 9690 card running on an Ubuntu 1110 server.

There was a kernel update so I scheduled a reboot after which the array was inaccessible. I checked the status a drive has died in the array, but the controller has thrown the entire array into an 'inoperable' state instead of simply degraded (what's the point of the RAID now ;-).

After taking out the 'dead' drive I run a quick test to find it completely functional without a bad sector to be found.

I try to put the drive back in but the array still marks the disk as degraded (remembering serial number or something??) and the entire array as inoperable…

So I swap it out for a known working drive (not the same capacity but higher – should still work) and initiate a rebuild with the the new drive as a replacement. This fails instantly with the error "(0x0B:0x0033): Unit busy : Failed to start Rebuild on Unit 0". The unit shouldn't be busy as it is not mounted (the card itself is listed with lshw but the array it provides is not).

I'm pretty much at an impasse now, I don't understand how I can have a single drive failure on a RAID10 that makes the entire array inaccessible, degraded I could understand but inaccessible?? I don't think the controller is faulty as prior to the reboot it was completely functional.


> info c0

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-10   INOPERABLE     -       -       256K    3725.25   Ri     ON

VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   931.51 GB SATA  0   -            SAMSUNG HD103SJ
p1    OK             u0   931.51 GB SATA  1   -            SAMSUNG HD103SJ
p2    OK             u0   931.51 GB SATA  2   -            SAMSUNG HD103SJ
p3    OK             u0   931.51 GB SATA  3   -            SAMSUNG HD103SJ
p4    OK             u0   931.51 GB SATA  4   -            SAMSUNG HD103SJ
p5    OK             -    1.36 TB   SATA  5   -            ST31500341AS
p6    OK             u0   931.51 GB SATA  6   -            SAMSUNG HD103SJ
p7    OK             u0   931.51 GB SATA  7   -            SAMSUNG HD103SJ

> /c0/u0 start rebuild disk=5

Sending rebuild start request to /c0/u0 on 1 disk(s) [5] ... Failed.
(0x0B:0x0033): Unit busy

Best Answer

Contacted LSI support and one of their 2nd level techs mad to write a script & firmware hack to bring the array into a regular degraded state.
From there it was business as usual to join a new disk to the array and rebuild.