ZFS and SAN: issue with data scrubbing

multipathproxmoxstorage-area-networkubuntu-20.04zfs

Working as scientists in a corporate environment, we are provided with storage resources from a SAN within an Ubuntu 20.04 virtual machine (Proxmox). The SAN controller is passed directly to the VM (PCIe passthrough).

The SAN itself uses hardware Raid 60 (no other option is given to us), and presents us with 380 TB that we can split in a number of LUNs. We would like to benefit from ZFS compression and snapshotting features. We have opted for 30 x 11 TB LUNs that we then organized as striped RAID-Z. The setup is redundant (two servers), we have backups and performance is good which oriented us towards striped RAID-Z in favor of the usual striped mirrors.

Independent on the ZFS geometry, we have noticed that a high writing load (> 1 GB/s) during ZFS scrubs results in disk errors, leading eventually to faulted devices. By looking at the files presenting errors we could link this problem to the scrubbing process trying to access data still present in the cache of the SAN. With moderate loads during the scrub the process completes without any errors.

Are there configuration parameters either for ZFS or for multipath that can be tuned within the VM to prevent this issue with the SAN cache?

Output of zpool status

  pool: sanpool
 state: ONLINE
  scan: scrub repaired 0B in 2 days 02:05:53 with 0 errors on Thu Mar 17 15:50:34 2022
config:

    NAME                                        STATE     READ WRITE CKSUM
    sanpool                                     ONLINE       0     0     0
      raidz1-0                                  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000002e  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000002f  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000031  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000032  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000033  ONLINE       0     0     0
      raidz1-1                                  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000034  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000035  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000036  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000037  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000038  ONLINE       0     0     0
      raidz1-2                                  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000062  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000063  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000064  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000065  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000066  ONLINE       0     0     0
      raidz1-3                                  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000006a  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000006b  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000006c  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000006d  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000006f  ONLINE       0     0     0
      raidz1-4                                  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000070  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000071  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000072  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000073  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000074  ONLINE       0     0     0
      raidz1-5                                  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000075  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000076  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000077  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000079  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000007a  ONLINE       0     0     0

errors: No known data errors

Output of multipath -ll

mpathr (360060e8012b003005040b00300000074) dm-18 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:25 sdz  65:144 active ready running
  `- 8:0:0:25 sdbd 67:112 active ready running
mpathe (360060e8012b003005040b00300000064) dm-5 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:13 sdn  8:208  active ready running
  `- 8:0:0:13 sdar 66:176 active ready running
mpathq (360060e8012b003005040b00300000073) dm-17 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:24 sdy  65:128 active ready running
  `- 8:0:0:24 sdbc 67:96  active ready running
mpathd (360060e8012b003005040b00300000063) dm-4 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:12 sdm  8:192  active ready running
  `- 8:0:0:12 sdaq 66:160 active ready running
mpathp (360060e8012b003005040b00300000072) dm-16 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:23 sdx  65:112 active ready running
  `- 8:0:0:23 sdbb 67:80  active ready running
mpathc (360060e8012b003005040b00300000062) dm-3 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:11 sdl  8:176  active ready running
  `- 8:0:0:11 sdap 66:144 active ready running
mpatho (360060e8012b003005040b00300000071) dm-15 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:22 sdw  65:96  active ready running
  `- 8:0:0:22 sdba 67:64  active ready running
mpathb (360060e8012b003005040b00300000038) dm-2 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:10 sdk  8:160  active ready running
  `- 8:0:0:10 sdao 66:128 active ready running
mpathn (360060e8012b003005040b00300000070) dm-14 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:21 sdv  65:80  active ready running
  `- 8:0:0:21 sdaz 67:48  active ready running
mpatha (360060e8012b003005040b0030000002e) dm-1 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:1  sdb  8:16   active ready running
  `- 8:0:0:1  sdaf 65:240 active ready running
mpathz (360060e8012b003005040b00300000033) dm-26 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:5  sdf  8:80   active ready running
  `- 8:0:0:5  sdaj 66:48  active ready running
mpathm (360060e8012b003005040b0030000006f) dm-13 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:20 sdu  65:64  active ready running
  `- 8:0:0:20 sday 67:32  active ready running
mpathy (360060e8012b003005040b00300000032) dm-25 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:4  sde  8:64   active ready running
  `- 8:0:0:4  sdai 66:32  active ready running
mpathl (360060e8012b003005040b0030000002f) dm-12 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:2  sdc  8:32   active ready running
  `- 8:0:0:2  sdag 66:0   active ready running
mpathx (360060e8012b003005040b0030000007a) dm-24 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:30 sdae 65:224 active ready running
  `- 8:0:0:30 sdbi 67:192 active ready running
mpathad (360060e8012b003005040b00300000037) dm-30 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:9  sdj  8:144  active ready running
  `- 8:0:0:9  sdan 66:112 active ready running
mpathk (360060e8012b003005040b0030000006d) dm-11 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:19 sdt  65:48  active ready running
  `- 8:0:0:19 sdax 67:16  active ready running
mpathw (360060e8012b003005040b00300000031) dm-23 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:3  sdd  8:48   active ready running
  `- 8:0:0:3  sdah 66:16  active ready running
mpathac (360060e8012b003005040b00300000036) dm-29 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:8  sdi  8:128  active ready running
  `- 8:0:0:8  sdam 66:96  active ready running
mpathj (360060e8012b003005040b0030000006c) dm-10 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:18 sds  65:32  active ready running
  `- 8:0:0:18 sdaw 67:0   active ready running
mpathv (360060e8012b003005040b00300000079) dm-22 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:29 sdad 65:208 active ready running
  `- 8:0:0:29 sdbh 67:176 active ready running
mpathab (360060e8012b003005040b00300000035) dm-28 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:7  sdh  8:112  active ready running
  `- 8:0:0:7  sdal 66:80  active ready running
mpathi (360060e8012b003005040b0030000006b) dm-9 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:17 sdr  65:16  active ready running
  `- 8:0:0:17 sdav 66:240 active ready running
mpathu (360060e8012b003005040b00300000077) dm-21 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:28 sdac 65:192 active ready running
  `- 8:0:0:28 sdbg 67:160 active ready running
mpathaa (360060e8012b003005040b00300000034) dm-27 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:6  sdg  8:96   active ready running
  `- 8:0:0:6  sdak 66:64  active ready running
mpathh (360060e8012b003005040b0030000006a) dm-8 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:16 sdq  65:0   active ready running
  `- 8:0:0:16 sdau 66:224 active ready running
mpatht (360060e8012b003005040b00300000076) dm-20 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:27 sdab 65:176 active ready running
  `- 8:0:0:27 sdbf 67:144 active ready running
mpathg (360060e8012b003005040b00300000066) dm-7 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:15 sdp  8:240  active ready running
  `- 8:0:0:15 sdat 66:208 active ready running
mpaths (360060e8012b003005040b00300000075) dm-19 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:26 sdaa 65:160 active ready running
  `- 8:0:0:26 sdbe 67:128 active ready running
mpathf (360060e8012b003005040b00300000065) dm-6 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:14 sdo  8:224  active ready running
  `- 8:0:0:14 sdas 66:192 active ready running

Best Answer

You're looking at the wrong spot. If you SAN faults under load, then you can't rely on it, period. Fix the SAN.