Solaris 10 x86 – trying to replace disk in zpool

solariszfszpool

I am having trouble replacing a disk on an existing zpool on a system running Solaris 10 on an x86 processor. The zpool was originally created with two mirrored slices. One of the drives failed, so I swapped it physically with a new drive. I ran prvtoc and fmthard to copy the disk label from the working drive onto the new drive:

prtvtoc /dev/rdsk/c1t0d0s2 >/tmp/c1t0d0s2.out
fmthard -s /tmp/c1t0d0s2.out >/dev/rdsk/c1t1d0s2

Then I tried to bring the new drive online and got a warning about the device still being faulted:

$ zpool online pool c1t1d0s6 
warning: device 'c1t1d0s6' onlined, but remains in faulted state

The output of zpool status -v is:

NAME          STATE     READ WRITE CKSUM
pool          DEGRADED     0     0     0
mirror-0    DEGRADED     0     0     0
c1t0d0s6  ONLINE       0     0     0
c1t1d0s6  UNAVAIL      0     0     0  corrupted data

(c1t1d0 is the replaced drive.)

Then I brought c1t1d0 offline again and tried running the zpool replace command, but this did not work, either:

$ zpool replace pool c1t1d0s6
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c1t1d0s6 overlaps with /dev/dsk/c1t1d0s2

Does anyone know what's going on? Is it safe to use the '-f' flag?

Edit: After running zpool replace -f, I get:

pool: pool
state: DEGRADED
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
    pool will no longer be accessible on older software versions.
scrub: none requested
config:

    NAME                STATE     READ WRITE CKSUM
    pool                DEGRADED     0     0     0
      mirror-0          DEGRADED     0     0     0
        c1t0d0s6        ONLINE       0     0     0
        replacing-1     UNAVAIL      0     0     0  insufficient replicas
          c1t1d0s6/old  OFFLINE      0     0     0
          c1t1d0s6      UNAVAIL      0   342     0  experienced I/O failures

I see errors on the new drive in iostat -e output. I guess the new drive might be bad, too?

Edit 2: I don't know what's going on. I tried a different drive with the same procedure. After running the zpool replace -f, the zfs pool ran a scrub, but the status output is:

  pool: pool
 state: ONLINE
 status: The pool is formatted using an older on-disk format.  The pool can
    still be used, but some features are unavailable.
 action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
    pool will no longer be accessible on older software versions.
 scrub: scrub completed after 12h56m with 0 errors on Wed Aug 29 06:49:16 2012
config:

    NAME                STATE     READ WRITE CKSUM
    pool              ONLINE       0     0     0
      mirror-0          ONLINE       0     0     0
        c1t0d0s6        ONLINE       0     0     0
        replacing-1     ONLINE   5.54M 19.9M     0
          c1t1d0s6/old  UNAVAIL      0     0     0  corrupted data
          c1t1d0s6      UNAVAIL      0     0     0  corrupted data

After offlining c1t1d0s6, the zpool status output is:

  pool: pool
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
    still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
    pool will no longer be accessible on older software versions.
scrub: scrub completed after 12h56m with 0 errors on Wed Aug 29 06:49:16 2012
config:

    NAME                STATE     READ WRITE CKSUM
    pool                ONLINE       0     0     0
      mirror-0          ONLINE       0     0     0
        c1t0d0s6        ONLINE       0     0     0
        replacing-1     ONLINE   5.54M 19.9M     0
          c1t1d0s6/old  UNAVAIL      0     0     0  corrupted data
          c1t1d0s6      UNAVAIL      0     0     0  corrupted data

I don't get it. Shouldn't the system be able to replace c1t1d0s6 using the mirror on c1t0d0s6?

Best Answer

Did you clear the alerts in fmadm? And the zpool clear... It's safe to run the zpool replace with the -f switch, but I think your statement is wrong, unless you already removed the bad disk.

http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html