Oracle 11gR2 – How to recover from normal redundancy when 1 of 2 failure groups goes down

oracleoracle-asm

Background info:

  • Oracle 11gR2
  • 2 failure groups – normal reduncancy
  • Each failure group associated with a single disk

Failure Group Alpha is one Disk 1 and Failure Group Bravo is on Disk 2

We recently ran into an issue on one of our Oracle servers. One of our disks (let's call it Disk 1/Failure Group Alpha) failed while Oracle was running. When we restarted the server Oracle would not come up because we did not have enough disks to satisfy our redundancy requirements.

How can we recover from this failure?

  • Is there a way to tell Oracle to start the instance with a failure group down, we don't care, we will fix the disk issue later?
  • Or do we have to pop a new disk in before the instance can be brought up again?

This happened in one our of staging areas and we would like work out what could be done in the future… particularly if a spare disk was not available.

Best Answer

The diskgroup will not be mounted automatically, but you can do it manually:

ALTER DISKGROUP your_disk_group_name MOUNT FORCE

Mounting Disk Groups Using the FORCE Option

In the FORCE mode, ASM attempts to mount the disk group even if it cannot discover all of the devices that belong to the disk group. This setting is useful if some of the disks in a normal or high redundancy disk group became unavailable while the disk group was dismounted.

If ASM discovers all of the disks in the disk group, then MOUNT FORCE fails. Therefore, use the MOUNT FORCE setting only if some disks are unavailable. Otherwise, use NOFORCE [the default].

The disk group mount succeeds if ASM finds at least one complete set of extents in a disk group. If ASM determines that one or more disks are not available, then ASM moves those disks off line and drops [sic!] the disks after the DISK_REPAIR_TIME expires.

In clustered ASM environments, if an ASM instance is not the first instance to mount the disk group, then using the MOUNT FORCE statement fails. This is because the disks have been accessed by another instance and the disks are not locally accessible.