When does ZFS “autoreplace” take effect

hard driveubuntu-16.04zfs

Background

autoreplace is documented like the following:

autoreplace=on | off
Controls automatic device replacement. If set to "off", device replacement must be initiated by the administrator by using the "zpool replace" command. If set to "on", any new device, found in the same physical location as a device that previously belonged to the pool, is automatically formatted and replaced. The default behavior is "off". This property can also be referred to by its shortened column name, "replace".

The following is the current status of that setting in the pool I'm interested in:

root@[...]:/# zpool get autoreplace zfs-pool
NAME      PROPERTY     VALUE    SOURCE
zfs-pool  autoreplace  on       local

So it seems to be enabled.

Observations

One disk has been removed because of S.M.A.R.T.-related errors and ZFS properly recognised that device as not being available anymore. The mirror in which the disk has been used changed to DEGRADED etc. Because I had multiple spare disks, I used zpool replace zfs-pool FAULTY_DISK SPARE_DISK to temporarily put one spare in place. That's necessary because with the UB 16.04 I'm using, automatically using spares doesn't work properly or even at all.

After the mirror was in sync again and the new disk has been physically attached, I restarted the system, because otherwise the used controllers prevent access to the new disk. During starting up, controllers recognize new disks, ask if those should be enabled or not and in the former case, the new disk is available to the OS afterwards. The disk has been initialized, partitions created etc. and was fully available like the faulty one before at the same physical slot. The important thing is that the OS used the same naming for the disk like before as well: /dev/sdf and /dev/disk/by-path/pci-0000:15:00.0-scsi-0:1:0:1-part*

Nevertheless, ZFS didn't use the new disk automatically to replace the formerly one. Even though the status output of the pool mentioned the serial number of the old disk as missing and which path it had in the past, which was the same like the new disk got in the meantime already. I needed to issue a replacement of the new disk manually using zpool replace zfs-pool pci-0000:15:00.0-scsi-0:1:0:1-part3. That made ZFS put the new disk into the correct mirror, because of the same path, and after resilvering the spare has been removed automatically as well.

NAME                                         STATE     READ WRITE CKSUM
zfs-pool                                     DEGRADED     0     0     0
  mirror-0                                   ONLINE       0     0     0
    pci-0000:05:00.0-scsi-0:1:0:0-part3      ONLINE       0     0     0
    pci-0000:15:00.0-scsi-0:1:0:0-part3      ONLINE       0     0     0
  mirror-1                                   DEGRADED     0     0     0
    pci-0000:05:00.0-scsi-0:1:0:1-part3      ONLINE       0     0     0
    spare-1                                  DEGRADED     0     0     0
      replacing-0                            DEGRADED     0     0     0
        11972718311040401135                 UNAVAIL      0     0     0  was /dev/disk/by-path/pci-0000:15:00.0-scsi-0:1:0:1-part3/old
        pci-0000:15:00.0-scsi-0:1:0:1-part3  ONLINE       0     0     0  (resilvering)
      pci-0000:15:00.0-scsi-0:1:0:3-part3    ONLINE       0     0     0
  mirror-2                                   ONLINE       0     0     0
    pci-0000:05:00.0-scsi-0:1:0:2-part3      ONLINE       0     0     0
    pci-0000:15:00.0-scsi-0:1:0:2-part3      ONLINE       0     0     0
spares
  pci-0000:05:00.0-scsi-0:1:0:3-part3        AVAIL
  pci-0000:15:00.0-scsi-0:1:0:3-part3        INUSE     currently in use

Questions

While the used command is document to work that way, I wonder why it was necessary with autoreplace being enabled? Shouldn't that have done that one step instantly after the new disk was partitioned successfully? Or is the property autoreplace necessary for the issued command to work at all? It's not documented to rely on that setting:

zpool replace [-f] pool old_device [new_device]
[…]
new_device is required if the pool is not redundant. If new_device is not specified, it defaults to old_device. This form of replacement is useful after an existing disk has failed and has been physically replaced. In this case, the new disk may have the same /dev/dsk path as the old device, even though it is actually a different disk. ZFS recognizes this.

Best Answer

ZFS depends on ZED to handle auto-replacing a failing/disconnected disks, so you must be sure ZED is running. However, latest 0.8.x ZED releases have a bug which prevent ZFS to correctly auto-partition the replaced disk. Note that this bug is not present on 0.7.x ZFS/ZED releases.

EDIT: some answers based on your comments below:

  • does ZED autoreplace "internally" somehow or are scripts necessary like for using hot spares and other actions? ZED handles autoreplace internally in its FMA (fault management agent). In other words, no script are required in the agent directory. These script generally runs after the FMA, and are supposed to start corollary actions as start a scrub, log to syslog, etc

  • where can I find details about auto-partitioning applied in case of autoreplace? I'm forwarding individual partitions to ZFS instead of whole disks. auto-partitioning only works when passing whole disk to ZFS (please note that it is ZFS itself, rather than ZED, to partition the affected disks). When passing existing partitions to ZFS (ie: using sda1 as vdev) the partition table is not touched at all.