Debian – zfs – hotspare, replace, detach: ressource is busy

debianzfszfsonlinux

I'm pretty new to zfsonlinux. I've just succeeded in setting up a brand new server, with a Debian ROOT on ZFS. All is working fine, but I've got an issue with hot spare and replacing disks.

Here is my pool:

NAME                            STATE     READ WRITE CKSUM
mpool                           ONLINE       0     0     0
  mirror-0                      ONLINE       0     0     0
    ata-ST1XXXXXXXXXXA-part1    ONLINE       0     0     0
    ata-ST1XXXXXXXXXXB-part1    ONLINE       0     0     0
  mirror-1                      ONLINE       0     0     0
    ata-ST1XXXXXXXXXXC-part1    ONLINE       0     0     0
    ata-ST1XXXXXXXXXXD-part1    ONLINE       0     0     0
spares  
  ata-ST1XXXXXXXXXXE-part1      AVAIL   
  ata-ST1XXXXXXXXXXF-part1      AVAIL  

Now, I'm able to start with the real fun. Disk pulling! I'm now unplugging disk C. I got a working pool, but DEGRADED (as expected):

NAME                            STATE     READ WRITE CKSUM
mpool                           ONLINE       0     0     0
  mirror-0                      ONLINE       0     0     0
    ata-ST1XXXXXXXXXXA-part1    ONLINE       0     0     0
    ata-ST1XXXXXXXXXXB-part1    ONLINE       0     0     0
  mirror-1                      DEGRADED     0     0     0
    ata-ST1XXXXXXXXXXC-part1    UNAVAIL      0     0     0
    ata-ST1XXXXXXXXXXD-part1    ONLINE       0     0     0
spares  
  ata-ST1XXXXXXXXXXE-part1      AVAIL   
  ata-ST1XXXXXXXXXXF-part1      AVAIL   

So far, so good. But, when I try to replace disk C with, let's say, disk E, I'm stuck with a DEGRADED pool anyway.

# zpool replace mpool ata-ST1XXXXXXXXXXC-part1 ata-ST1XXXXXXXXXXE-part1
cannot open '/dev/disk/by-id/ata-ST1XXXXXXXXXXE-part1': Device or ressource busy
(and after a few sec)
Make sure to wait until resilver is done before rebooting.

So I'm waiting some secs to let resilvering (with 0 errors), then I've got:

NAME                                STATE     READ WRITE CKSUM
mpool                               ONLINE       0     0     0
  mirror-0                          ONLINE       0     0     0
    ata-ST1XXXXXXXXXXA-part1        ONLINE       0     0     0
    ata-ST1XXXXXXXXXXB-part1        ONLINE       0     0     0
  mirror-1                          DEGRADED     0     0     0
    spare-0                         UNAVAIL
        ata-ST1XXXXXXXXXXC-part1    UNAVAIL      0     0     0
        ata-ST1XXXXXXXXXXE-part1    ONLINE       0     0     0
    ata-ST1XXXXXXXXXXD-part1        ONLINE       0     0     0
spares  
  ata-ST1XXXXXXXXXXE-part1          INUSE       currently in use   
  ata-ST1XXXXXXXXXXF-part1          AVAIL   

Then if I zpool detach the C disk (as explained here), my pool is getting ONLINE again, and all is working fine (with a pool with only 5 HDD)


So here are my questions:

  1. Why replacing the C disk is not enough to rebuild a full pool? As
    explained on the oracle blog and here too I was expecting that I do not have to detach the
    disk for zfs to rebuild the pool properly (and it's far better to
    keep in the zpool status traces of the unplugged disk, for
    maintening convenience)
  2. Why zpool keep telling me that spares disks are "busy" (they are
    truly not)?
  3. See below: how can I automatically get my spare disk back?

EDIT: Even worst for question1 => When I plug back in disk C, zfs don't manage my spare back! So I'm left with one less disk

NAME                                STATE     READ WRITE CKSUM
mpool                               ONLINE       0     0     0
  mirror-0                          ONLINE       0     0     0
    ata-ST1XXXXXXXXXXA-part1        ONLINE       0     0     0
    ata-ST1XXXXXXXXXXB-part1        ONLINE       0     0     0
  mirror-1                          ONLINE       0     0     0
    ata-ST1XXXXXXXXXXE-part1        ONLINE       0     0     0
    ata-ST1XXXXXXXXXXD-part1        ONLINE       0     0     0
spares  
  ata-ST1XXXXXXXXXXF-part1          AVAIL 

Best Answer

Short version:

You have to do it the other way round: replace the failed pool disk (with a new disk or with itself) and after that, detach the spare disk from the pool (so that it becomes available to all vdevs). I assume the spare is busy as long as the disk it was used to replace is not replaced itself. Detaching this disk or another disk only makes it worse.

Also, I remember that ZoL has no automatic attach/detach for spares depending on events, you have to script your own or use something like the ZFS event daemon.


Long version:

Regarding your follow-up comment

If C disk is FAULTED, ok let's replace it then detach it. But It scew up my pool, because zpool didnt remember I used to have a C disk in the mirror-1 :/

That depends on how you see it. If you detach a disk from a mirror, it is not relevant anymore. It may be defective, it may get used on another system, it may get replaced under manufacturer warranty. Whatever it is done with it, your pool does not care.

If you just detach the disk, then it will be degraded; if you instead supply another disk (from automatic spare, manual spare or fully manual replacement), this disk will assume the role of the old disk (hence the term replace, the new disk fully replaces the old disk in its position and its duties).

If you want, you can add the detached disk back to the pool, for example as a spare (so the initial situation is reversed).

How spares work on ZFS systems

The spares only really make sense with automatic activation. ZFS storage arrays as designed by Sun had many similar disks, amounts of 18 to 48 disks were not uncommon. They consisted of multiple vdevs, for example 4 x RAID Z2 for an 24 disk system. Additionally, they were managed by a dedicated administrator, but nobody can work 24/7. Therefore they needed something as first response, and it had to work on all vdevs, because any disk might fail at any moment.

So, if late at night a disk in your second vdev fails, the system automatically takes one of the two configured spares and replaces the faulted disk so that the pool works as usual (same performance for customers that use a website whose database runs on it, for example). In the morning, the admin reads the report of the failure and troubleshoots the cause:

  • If the disk has died, he might replace it with a replacement disk in the same tray, let it resilver and then the hotspare is automatically retired back to spare duty, watching for another dead disk where it can do first response.
  • If no replacement disk is available, he might even make the spare the new data disk, reducing the number of spares temporarily by 1 (until another replacement disk is shipped which will become the new spare).
  • If it was just a controller error dropping the disk, he might even replace it with itself, triggering the same spare renewal as in the first case.

If you think about it the way the engineers designed for the most common anticipated usage scenario, it will make much more sense. That does not mean that you have to do exactly as described, it just might be a reason for the behavior.

Answers to your questions

Why replacing the C disk is not enough to rebuild a full pool? As explained on the oracle blog and here too I was expecting that I do not have to detach the disk for zfs to rebuild the pool properly (and it's far better to keep in the zpool status traces of the unplugged disk, for maintening convenience)

As seen above, you can either replace the pool disk with another or itself (spare will be free and continues to work as spare), or you can detach the pool disk, whereas the spare will permanently assume the role of a pool disk and you have to add another spare by hand with zpool add poolname spare diskname (which can be the detached disk or a new one).

Why zpool keep telling me that spares disks are "busy" (they are truly not)?

I assume it was because of outstanding IO. That would explain why it just took a moment to complete the operation.

See below: how can I automatically get my spare disk back?

  • Enable automatic spare replacement (default on Solaris/illumos, bit of a hassle on Linux)
  • Replace the faulted pool disk with zpool replace (instead of detaching it). The detach step is only needed for the spare disk after replacement of the pool disk and if you do not have automatic management (which makes no sense in my eyes except for specific pool layouts and admin situations).