Freebsd – ZFS vdevs accumulate checksum errors, but individual disks do not

I'm running a vendor-specific derivative of FreeNAS 9.3.

My trouble started when I installed a new JBOD chassis to add two new vdevs into my pool, and the chassis had a bad board. During this time, I was seeing SAS power errors on the drives on the bad board–my new drives were effectively turning on and off again, repeatedly, every minute.

I replaced the board and now, by most measures, the drives are functioning well, but ZFS is still giving me extremely strange checksum errors when I view zpool status. I think there were some bad CoW writes when I was having the SAS power issues.

The first chassis with the CPU, boot drive, RAM, etc., connects to the first expansion JBOD chassis via mini-SAS, and the second JBOD expansion chassis is daisy-chained through the first JBOD expansion chassis, also via mini-SAS.

[Chassis 1: boot drive, two L2ARC SSDs, 11/11 drives of RAIDZ3-0,
1/11 drives RAIDZ3-1] –>mini-SAS to Chassis 2
[Chassis 2: 10/11 drives of
RAID Z3-1, 6/11 drives of RAID Z3-2] –>mini-SAS to Chassis 3
[Chassis 3: 5/11 drives of RAIDZ3-2, 11/11 drives of RAIDZ3-3]

The checksum errors don't neatly map to any one controller or chassis, but my hunch is that when I was having these power issues, whatever data was being written to the different new disks was being written badly across the two new vdevs.

My HBAs are on good LSI firmware–all are on 20.00.04.00 or 20.00.08.00

I've swapped mini-SAS cables, and tried using different ports, to no avail.

The output of zpool statusis showing checksum errors accumulating on the two new vdevs, and after either a scrub, reboot, or zpool clear, eventually zpool status marks those vdevs as degraded. What's strange is that it also marks some of the drives that belong to those vdevs as degraded, but their actual error count of the individual disks are all 0. zdb shows that the individual drives are marked degraded because they have too many checksum errors, even though all their checksum error counts are actually 0. What's also strange is that the pool-level checksum errors show a lower number than the checksum errors from the two problem vdevs added together.

zpool status -v persistently shows a permanent error in a snapshot mapped to a 0x0 inode that has long been deleted, but can't seem to be cleared by multiple scrubs, reboots, or zpool clear. Also, other permanent errors float in and out, sometimes only showing as hexcode inodes, and other times as part of recent snapshots. I can't find any 0x0 with lsof.

I believe that there might be some kind of data corruption with the metadata in the pool.

I'm looking for a way to surgically remove these phantom snapshots or otherwise return my pool to a healthy state without destroying my data. I suspect that somewhere, ZFS is iterating over these corrupt phantom snapshots and causing both the bizarre checksum errors and the degraded states on the vdevs.

I have "cold" LTO backups of much of my important data, but otherwise, if I can't repair my pool, I'm preparing to set up a second server, offload everything to the "hot" second server, destroy my pool at the top level, and then reload from the hot backup.

Here's the output of zpool status -v:

[root@Jupiter] ~# zpool status -v
  pool: freenas-boot
 state: ONLINE
status: One or more devices are configured to use a non-native block size.
        Expect reduced performance.
action: Replace affected devices with devices that support the configured block size, or migrate data to a properly configured pool.
  scan: resilvered 944M in 0h17m with 0 errors on Tue Aug  9 11:56:28 2016
config:

    NAME        STATE     READ WRITE CKSUM
    freenas-boot  ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        da46p2  ONLINE       0     0     0  block size: 8192B configured, 8388608B native
        da47p2  ONLINE       0     0     0  block size: 8192B configured, 8388608B native

errors: No known data errors

  pool: pool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub in progress since Fri Sep  9 22:43:51 2016
        6.27T scanned out of 145T at 1.11G/s, 35h27m to go
        0 repaired, 4.33% done
config:

    NAME                                            STATE     READ WRITE CKSUM
    pool                                            DEGRADED     0     0   118
      raidz3-0                                      ONLINE       0     0     0
        gptid/ac108605-265c-11e5-9a02-0cc47a599098  ONLINE       0     0     0
        gptid/ac591d4e-265c-11e5-9a02-0cc47a599098  ONLINE       0     0     0
        gptid/ac92fd0d-265c-11e5-9a02-0cc47a599098  ONLINE       0     0     0
        gptid/accd3076-265c-11e5-9a02-0cc47a599098  ONLINE       0     0     0
        gptid/ad067e97-265c-11e5-9a02-0cc47a599098  ONLINE       0     0     0
        gptid/ad46cbee-265c-11e5-9a02-0cc47a599098  ONLINE       0     0     0
        gptid/ad91ba17-265c-11e5-9a02-0cc47a599098  ONLINE       0     0     0
        gptid/adcbdd0a-265c-11e5-9a02-0cc47a599098  ONLINE       0     0     0
        gptid/ae07dc0d-265c-11e5-9a02-0cc47a599098  ONLINE       0     0     0
        gptid/ae494d10-265c-11e5-9a02-0cc47a599098  ONLINE       0     0     0
        gptid/ae93a3a5-265c-11e5-9a02-0cc47a599098  ONLINE       0     0     0
      raidz3-1                                      ONLINE       0     0     0
        gptid/12f6a4c5-c929-11e5-8075-0cc47a599098  ONLINE       0     0     0
        gptid/511ea1f9-1932-11e6-9b1e-0cc47a599098  ONLINE       0     0     0
        gptid/14436fcf-c929-11e5-8075-0cc47a599098  ONLINE       0     0     0
        gptid/14f50aa3-c929-11e5-8075-0cc47a599098  ONLINE       0     0     0
        gptid/159b5654-c929-11e5-8075-0cc47a599098  ONLINE       0     0     0
        gptid/163d682b-c929-11e5-8075-0cc47a599098  ONLINE       0     0     0
        gptid/16ee624e-c929-11e5-8075-0cc47a599098  ONLINE       0     0     0
        gptid/1799dde3-c929-11e5-8075-0cc47a599098  ONLINE       0     0     0
        gptid/184c2ea4-c929-11e5-8075-0cc47a599098  ONLINE       0     0     0
        gptid/18f51c30-c929-11e5-8075-0cc47a599098  ONLINE       0     0     0
        gptid/19a861ea-c929-11e5-8075-0cc47a599098  ONLINE       0     0     0
      raidz3-2                                      DEGRADED     0     0   236
        gptid/5f80fc42-4e00-11e6-b7cf-0cc47a599098  DEGRADED     0     0     0  too many errors
        gptid/60369e0f-4e00-11e6-b7cf-0cc47a599098  DEGRADED     0     0     0  too many errors
        gptid/60e8234a-4e00-11e6-b7cf-0cc47a599098  DEGRADED     0     0     0  too many errors
        gptid/61a235f2-4e00-11e6-b7cf-0cc47a599098  DEGRADED     0     0     0  too many errors
        gptid/62580471-4e00-11e6-b7cf-0cc47a599098  DEGRADED     0     0     0  too many errors
        gptid/6316a38a-4e00-11e6-b7cf-0cc47a599098  DEGRADED     0     0     0  too many errors
        gptid/63d4bce8-4e00-11e6-b7cf-0cc47a599098  DEGRADED     0     0     0  too many errors
        gptid/ebfc2b99-6893-11e6-9b09-0cc47a599098  ONLINE       0     0     0
        gptid/654f143a-4e00-11e6-b7cf-0cc47a599098  DEGRADED     0     0     0  too many errors
        gptid/66236b33-4e00-11e6-b7cf-0cc47a599098  DEGRADED     0     0     0  too many errors
        gptid/66eda3f6-4e00-11e6-b7cf-0cc47a599098  DEGRADED     0     0     0  too many errors
      raidz3-3                                      DEGRADED     0     0   176
        gptid/c77a9da9-4e02-11e6-b7cf-0cc47a599098  ONLINE       0     0     0
        gptid/c83e100e-4e02-11e6-b7cf-0cc47a599098  ONLINE       0     0     0
        gptid/c8fd9ced-4e02-11e6-b7cf-0cc47a599098  DEGRADED     0     0     0  too many errors
        gptid/c9bb21ba-4e02-11e6-b7cf-0cc47a599098  DEGRADED     0     0     0  too many errors
        gptid/ca7a48db-4e02-11e6-b7cf-0cc47a599098  DEGRADED     0     0     0  too many errors
        gptid/cb422329-4e02-11e6-b7cf-0cc47a599098  DEGRADED     0     0     0  too many errors
        gptid/cbfe4c21-4e02-11e6-b7cf-0cc47a599098  ONLINE       0     0     0
        gptid/ccc43528-4e02-11e6-b7cf-0cc47a599098  ONLINE       0     0     0
        gptid/cd93a34c-4e02-11e6-b7cf-0cc47a599098  ONLINE       0     0     0
        gptid/ce622f51-4e02-11e6-b7cf-0cc47a599098  ONLINE       0     0     0
        gptid/cf2591d3-4e02-11e6-b7cf-0cc47a599098  ONLINE       0     0     0
    cache
      gptid/aedd3872-265c-11e5-9a02-0cc47a599098    ONLINE       0     0     0
      gptid/af559c10-265c-11e5-9a02-0cc47a599098    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <0x357>:<0x2aef3>
        <0x37b>:<0x397285>
pool/.system@auto-20160720.2300-2d:<0x0>

Via the FreeNAS GUI, I tried copying the System dataset pool from pool over to freenas-boot and then tried using zfs destroy to delete the pool copy of pool/.system and leaving the freenas-boot copy intact. I was able to use zfs destroy to delete everything within pool/.system listed in zfs list, but upon trying to destroy pool/.system with zfs destroy, the shell returned the error: Cannot iterate filesystems: I/O error. I tried zfs destroy on pool/.system with the the -f, -r, and -R flags, as per the Oracle ZFS documentation, to no avail.

I started yet another scrub. Perhaps eliminating the contents of pool/.system on the pool copy of the System dataset pool will allow the scrub to clear out the metadata error with the phantom snapshot pool/.system@auto-20160720.2300-2d.

I'm wondering if it's possible to resilver each disk that's showing up as degraded, one-by-one, so that the "bad" metadata that's not being reference can be abandoned. I've resilvered two disks, but now I'm running into an issue wherein resilvering any additional disk causes the other disks I've already resilvered to begin resilvering again at the same time. I believe it might be a ZFS bug related to periodic snapshot tasks, and I've gone ahead and deleted my periodic snapshot task and destroyed all my snapshots, but I'm hesitant to try to resilver yet another one of the degraded drives for fear that all the previously resilvered disks will resilver again, leaving me without any redundancy, eventually to the point of having a faulted pool.

After disabling my periodic snapshot tasks and deleting all my snapshots, I tried wiping one disk and then resilvering it, but the three disks that I'd already resilvered started resilvering again. Now I'm almost certain that I'd have two different disks per each problem RAID-Z3 vdev that would resilver, so if I attempt to resilver any more disks, I'll lose the redundancy in each of the problem vdevs and my pool will fault.

One other bizarre behavior is that checking zpool status -v actually increases the pool's checksum error count incrementally, but checking zpool status does not. It's almost as if the -v flag itself is iterating over whatever mechanism is causing checksum errors.

Would using zdb -c on my pool somehow be able to "fix" these metadata errors?

Best Answer

The 0x0 and other hexadecimal numbers appear instead of filenames and other objects when metadata is corrupted. If you cannot get rid of it by destroying the objects that are affected (I understood they refer to snapshots) then the damage is probably too big to be repaired. I would restore the pool from backup in this case, especially when you have further weird effects like broken metadata appearing and disappearing.

You can read about the methods how to get rid of most problems in the ZFS admin guide here. But ZFS also gives you a URL where to look for solutions when you type zpool status.

Best Answer

Related Solutions

Why did the zpool replace never finish and what should I do now

Debian – ZFS endless resilvering

Related Topic