Freebsd – Why are the ZFS pools “unavailable”

freebsdraidzfszpool

I just replaced a hard-drive, that was part of two different redundant pools, and now both pools are unavailable…

Details:

  • There are four drives: 2x4TB (da0 and ada1) and 2x3TB (da1 and da2).
  • One pool is a RAIDZ1 consisting of both of the 3TB drives in their entireties and the 3TB-parts of the 4TB-drives.
  • The other pool is a mirror consisting of the remaining space of the two bigger drives.
  • I replaced one of the 4TB-drives with another of the same size (da0)…

I expected both pools to go into "degraded" mode until I spliced the replacement into the two parts and added each part to its pool.

Instead the computer rebooted unceremoniously and, upon coming back, both pools are "unavailable":

      pool: aldan
     state: UNAVAIL
    status: One or more devices could not be opened.  There are insufficient
            replicas for the pool to continue functioning.
    action: Attach the missing device and online it using 'zpool online'.
       see: http://illumos.org/msg/ZFS-8000-3C
      scan: none requested
    config:

            NAME                      STATE     READ WRITE CKSUM
            aldan                     UNAVAIL      0     0     0
              raidz1-0                UNAVAIL      0     0     0
                1257549909357337945   UNAVAIL      0     0     0  was /dev/ada1p1
                1562878286621391494   UNAVAIL      0     0     0  was /dev/da1
                8160797608248051182   UNAVAIL      0     0     0  was /dev/da0p1
                15368186966842930240  UNAVAIL      0     0     0  was /dev/da2
            logs
              4588208516606916331     UNAVAIL      0     0     0  was /dev/ada0e

      pool: lusterko
     state: UNAVAIL
    status: One or more devices could not be opened.  There are insufficient
            replicas for the pool to continue functioning.
    action: Attach the missing device and online it using 'zpool online'.
       see: http://illumos.org/msg/ZFS-8000-3C
      scan: none requested
    config:

            NAME                     STATE     READ WRITE CKSUM
            lusterko                 UNAVAIL      0     0     0
              mirror-0               UNAVAIL      0     0     0
                623227817903401316   UNAVAIL      0     0     0  was /dev/ada1p2
                7610228227381804026  UNAVAIL      0     0     0  was /dev/da0p2

I split the new drive now, but attempts to "zpool replace" are rebuffed with "pool is unavailable". I'm pretty sure, if I simply disconnect the new drive, both pools will become Ok (if degraded). Why are they both "unavailable" now? All of the devices are online, according to camcontrol:

<ATA TOSHIBA MG03ACA4 FL1A>        at scbus0 target 0 lun 0 (pass0,da0)
<ATA Hitachi HUS72403 A5F0>        at scbus0 target 1 lun 0 (pass1,da1)
<ATA TOSHIBA HDWD130 ACF0>         at scbus0 target 2 lun 0 (pass2,da2)
<M4-CT128M4SSD2 0309>              at scbus1 target 0 lun 0 (pass3,ada0)
<MB4000GCWDC HPGI>                 at scbus2 target 0 lun 0 (pass4,ada1)

The OS is FreeBSD-11.3-STABLE/amd64. What's wrong?

Update: no, I didn't explicitly offline the device(s) before unplugging the disk — and it is already on its way back to Amazon. I'm surprised, such offlining is necessary — should not ZFS be able to handle the sudden death of any drive? And it shouldn't it, likewise, be prepared for a technician replacing the failed drive with another? Why is it throwing a fit like this?

I have backups and can rebuild the pools from scratch — but I'd like to figure out, how to avoid doing this. Or, if not possible, to file a proper bug-report…

I unplugged the new drive completely, but the pool's status hasn't changed… Maybe, I need to reboot — whether or not that helps, it is quite a disappointment.

Update 2: multiple reboots, with and without the new disk attached, did not help. However, zpool import lists both pools just as I'd expect them: degraded (but available!). For example:

   pool: lusterko
     id: 11551312344985814621
  state: DEGRADED
 status: One or more devices are missing from the system.
 action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: http://illumos.org/msg/ZFS-8000-2Q
 config:

        lusterko                  DEGRADED
          mirror-0                DEGRADED
            ada1p2                ONLINE
            12305582129131953320  UNAVAIL  cannot open

But zpool status continues to insist, all devices are unavailable… Any hope?

Best Answer

Maybe also you did not offline the old drive prior to removing it. (It's a possibility that ZFS thinks that the logical drives (your pools) are corrupted, and the controller thinks they are fine. This happens if there's a difference in disk cylinder size - rare case but can happen.)

To get out of the situation:

  • get the name of the disk from zpool status
  • use diskinfo to identify the physical location of the UNAVAILABLE drive noted from above
  • reconfigure it with cfgadm -c unconfigure and cfgadm -c configure
  • bring the new disk online - zpool online zone
  • update zone - zpool replace zone (zpool status zone should show online)
  • run the zpool replace command to replace the disk