MSA20 RAID5 recovery failure due to URE on another disk

failedhp-modular-smart-arrayraidraid5

I have MSA20 with one disk array on 12 disks and 3 LUNs on it (each raid 5). A few days ago one disk in one of the LUNs was failed and I replaced it. But raid5 recovering failed at 13% and I see in ADU report that one of the disk has "Errors Logged = 5566" and according SCSI specifications it is URE (Sense Code=0x11, Qualifier=0x00). In serial log I also see URE error. It seems that Raid5 can't be rebuilt because of this. So I have a few questions:

  1. Is there a way to recover raid5 still?

  2. If I leave new disk that was replaced and remove disk with URE, will other LUNs be destroyed or just failed LUN? If all LUNs will fail what is the sense to make each LUN with own raid on one disk group array if 2 failed disk can destroy all?

  3. As I understand the preferred way is to create one disk array for one LUN in future and not one array with few LUNs?

Thanks.

Best Answer

1) It is very unlikely that you'll be able to recover this particular array. RAID is not backup. This is one of the many reasons you need backups.

2) It depends how the LUNs are set up. If you have one RAID 5 array with all 12 disk that is separated into 3 logical units, then since the array is gone, all its logical units are gone. If you have three separate RAID 5 arrays each with 4 disks, then only the array containing these two disks is gone, and the other arrays (and hence their logical units) will be fine.

3) It largely depends on what you want to do. There may be good reasons to have separate arrays on separate disks. For example, you may want to prevent a heavily loaded array from slowing down other arrays. If the arrays are on the same physical disks, you can't do this. Or you may want to allow a heavily loaded array to be able to get all the bandwidth of all the disks. If you have separate arrays on separate disks, you can't do that.

And there are also reasons you might want to put multiple logical units on the same array. You may want to isolate filesystems so that filling one up won't fill up the other.

If you put all the logical units on one array, you lose less space. A single RAID 5 array on 12 1TB disks gives you 11TB usable, divided into three equal portions, that's 3.6TB each. If you create three separate arrays each with 4 1TB disks, that's 3TB each. So you'd trade off size to get the extra reliability.

The specifics of what flexibility you have and what affects that has depends on the specifics of your controller.

And, some advice for the future:

  1. Consider RAID 6. It can tolerate the failure of two drives.

  2. Make 100% sure your arrays are tested regularly and that failed drives are replaced promptly. This will dramatically reduce the chance of a drive failure during a degraded state.

  3. RAID is not backup. Keep regular backups to a physically-separate device.

If you have data on there that's not backed up, try to recover as much of it as possible immediately. However, if you can't even get the array to mount, professional recovery is your only hope.

Related Topic