Linux – Create zfs pool that allows replacing one of the disks with a slightly smaller disk

centos7linuxraidzfszfsonlinux

This is a question regarding zfs on Linux (CentOS 7). I have a very simple setup with two 8 TB disks, one disk mirroring the other.

zpool create -f -o ashift=12 $zpoolName mirror $disksById

In case one of the disk needs to be replaced, the replacement disk must be of equal or greater size than the smallest of the two disk in the configuration, according to the zpool manual pages . And from what I have understood it is common that the exact size usually differs a bit between drives of different make and model (and model revision), even if they all are labelled 8 TB. However, I would like to be able to replace it with any other 8 TB disk, not necessarily by the same make and model.

How do I achieve this?

I would have expected an option to the zpool create command so that not the entire disk is used for the pool, but leaving some slack, however I cannot find such an option. The only suggestion that I have seen is partitioning the disk before creating the pool, creating one "pool" partition and one "slack" partition, but I've read the this will affect disk performance as the disk cache can not be used properly by zfs, so I suppose that I would like to avoid this.

Best Answer

The only suggestion that I have seen is partitioning the disk before creating the pool, creating one "pool" partition and one "slack" partition

This is the correct answer.

but I've read the this will affect disk performance as the disk cache can not be used properly by zfs.

This is a misunderstanding. Using a partition rather than a full disk only affects performance if the partition is misaligned, which typically requires some real determination on the user's part, if you're using vaguely modern partition editors. Linux and BSD fdisk, sfdisk, and gparted all understand partition boundaries and work within them unless outright forced not to.

Further, if you look closely at a disk that's been fed whole to zfs, you'll notice that zfs has actually partitioned it itself. Example:

root@banshee:~# zpool status data
  pool: data
 state: ONLINE
  scan: scrub repaired 0 in 27h54m with 0 errors on Mon Mar 13 05:18:20 2017
config:

    NAME                                           STATE     READ WRITE CKSUM
    data                                           ONLINE       0     0     0
      mirror-0                                     ONLINE       0     0     0
        wwn-0x50014ee206fd9549  ONLINE       0     0     0
        wwn-0x50014ee2afb368a9    ONLINE       0     0     0
      mirror-1                                     ONLINE       0     0     0
        wwn-0x50014ee25d2510d4  ONLINE       0     0     0
        wwn-0x5001517bb29d5333  ONLINE       0     0     0

errors: No known data errors

root@banshee:~# ls -l /dev/disk/by-id | grep 510d4
lrwxrwxrwx 1 root root  9 Mar 22 15:57 wwn-0x50014ee25d2510d4 -> ../../sdd
lrwxrwxrwx 1 root root 10 Mar 22 15:57 wwn-0x50014ee25d2510d4-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Mar 22 15:57 wwn-0x50014ee25d2510d4-part9 -> ../../sdd9

As you can see, ZFS has already partitioned the raw disks in the pool. The pool uses partition 1; partition 9 is left slack.

root@banshee:~# sfdisk -d /dev/sdd
label: gpt
label-id: B2DED677-DB67-974C-80A6-070B72EB8CFB
device: /dev/sdd
unit: sectors
first-lba: 34
last-lba: 3907029134

/dev/sdd1 : start=        2048, size=  3907010560, type=6A898CC3-1DD2-11B2-99A6-080020736631, uuid=A570D0A4-EA32-F64F-80D8-7479D918924B, name="zfs"
/dev/sdd9 : start=  3907012608, size=       16384, type=6A945A3B-1DD2-11B2-99A6-080020736631, uuid=85D0957B-65AF-6B4A-9F1B-F902FE539170

sdd9 is 16384 sectors long. This is a 4K disk, so that comes out to 64M, and any disk that's no more than 63M-ish smaller than the existing disk should be fine as a replacement for this one, should it fail.