I'm pretty new to zfsonlinux. I've just succeeded in setting up a brand new server, with a Debian ROOT on ZFS. All is working fine, but I've got an issue with hot spare and replacing disks.
Here is my pool:
NAME STATE READ WRITE CKSUM
mpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-ST1XXXXXXXXXXA-part1 ONLINE 0 0 0
ata-ST1XXXXXXXXXXB-part1 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-ST1XXXXXXXXXXC-part1 ONLINE 0 0 0
ata-ST1XXXXXXXXXXD-part1 ONLINE 0 0 0
spares
ata-ST1XXXXXXXXXXE-part1 AVAIL
ata-ST1XXXXXXXXXXF-part1 AVAIL
Now, I'm able to start with the real fun. Disk pulling! I'm now unplugging disk C. I got a working pool, but DEGRADED (as expected):
NAME STATE READ WRITE CKSUM
mpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-ST1XXXXXXXXXXA-part1 ONLINE 0 0 0
ata-ST1XXXXXXXXXXB-part1 ONLINE 0 0 0
mirror-1 DEGRADED 0 0 0
ata-ST1XXXXXXXXXXC-part1 UNAVAIL 0 0 0
ata-ST1XXXXXXXXXXD-part1 ONLINE 0 0 0
spares
ata-ST1XXXXXXXXXXE-part1 AVAIL
ata-ST1XXXXXXXXXXF-part1 AVAIL
So far, so good. But, when I try to replace disk C with, let's say, disk E, I'm stuck with a DEGRADED pool anyway.
# zpool replace mpool ata-ST1XXXXXXXXXXC-part1 ata-ST1XXXXXXXXXXE-part1
cannot open '/dev/disk/by-id/ata-ST1XXXXXXXXXXE-part1': Device or ressource busy
(and after a few sec)
Make sure to wait until resilver is done before rebooting.
So I'm waiting some secs to let resilvering (with 0 errors), then I've got:
NAME STATE READ WRITE CKSUM
mpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-ST1XXXXXXXXXXA-part1 ONLINE 0 0 0
ata-ST1XXXXXXXXXXB-part1 ONLINE 0 0 0
mirror-1 DEGRADED 0 0 0
spare-0 UNAVAIL
ata-ST1XXXXXXXXXXC-part1 UNAVAIL 0 0 0
ata-ST1XXXXXXXXXXE-part1 ONLINE 0 0 0
ata-ST1XXXXXXXXXXD-part1 ONLINE 0 0 0
spares
ata-ST1XXXXXXXXXXE-part1 INUSE currently in use
ata-ST1XXXXXXXXXXF-part1 AVAIL
Then if I zpool detach
the C disk (as explained here), my pool is getting ONLINE again, and all is working fine (with a pool with only 5 HDD)
So here are my questions:
- Why replacing the C disk is not enough to rebuild a full pool? As
explained on the oracle blog and here too I was expecting that I do not have to detach the
disk for zfs to rebuild the pool properly (and it's far better to
keep in the zpool status traces of the unplugged disk, for
maintening convenience) - Why zpool keep telling me that spares disks are "busy" (they are
truly not)? - See below: how can I automatically get my spare disk back?
EDIT: Even worst for question1 => When I plug back in disk C, zfs don't manage my spare back! So I'm left with one less disk
NAME STATE READ WRITE CKSUM
mpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-ST1XXXXXXXXXXA-part1 ONLINE 0 0 0
ata-ST1XXXXXXXXXXB-part1 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-ST1XXXXXXXXXXE-part1 ONLINE 0 0 0
ata-ST1XXXXXXXXXXD-part1 ONLINE 0 0 0
spares
ata-ST1XXXXXXXXXXF-part1 AVAIL
Best Answer
Short version:
You have to do it the other way round: replace the failed pool disk (with a new disk or with itself) and after that, detach the spare disk from the pool (so that it becomes available to all vdevs). I assume the spare is busy as long as the disk it was used to replace is not replaced itself. Detaching this disk or another disk only makes it worse.
Also, I remember that ZoL has no automatic attach/detach for spares depending on events, you have to script your own or use something like the ZFS event daemon.
Long version:
Regarding your follow-up comment
That depends on how you see it. If you detach a disk from a mirror, it is not relevant anymore. It may be defective, it may get used on another system, it may get replaced under manufacturer warranty. Whatever it is done with it, your pool does not care.
If you just detach the disk, then it will be degraded; if you instead supply another disk (from automatic spare, manual spare or fully manual replacement), this disk will assume the role of the old disk (hence the term
replace
, the new disk fully replaces the old disk in its position and its duties).If you want, you can add the detached disk back to the pool, for example as a spare (so the initial situation is reversed).
How spares work on ZFS systems
The spares only really make sense with automatic activation. ZFS storage arrays as designed by Sun had many similar disks, amounts of 18 to 48 disks were not uncommon. They consisted of multiple vdevs, for example 4 x RAID Z2 for an 24 disk system. Additionally, they were managed by a dedicated administrator, but nobody can work 24/7. Therefore they needed something as first response, and it had to work on all vdevs, because any disk might fail at any moment.
So, if late at night a disk in your second vdev fails, the system automatically takes one of the two configured spares and replaces the faulted disk so that the pool works as usual (same performance for customers that use a website whose database runs on it, for example). In the morning, the admin reads the report of the failure and troubleshoots the cause:
If you think about it the way the engineers designed for the most common anticipated usage scenario, it will make much more sense. That does not mean that you have to do exactly as described, it just might be a reason for the behavior.
Answers to your questions
As seen above, you can either replace the pool disk with another or itself (spare will be free and continues to work as spare), or you can detach the pool disk, whereas the spare will permanently assume the role of a pool disk and you have to add another spare by hand with
zpool add poolname spare diskname
(which can be the detached disk or a new one).I assume it was because of outstanding IO. That would explain why it just took a moment to complete the operation.
zpool replace
(instead of detaching it). The detach step is only needed for the spare disk after replacement of the pool disk and if you do not have automatic management (which makes no sense in my eyes except for specific pool layouts and admin situations).