Does a 3ware “ECC-ERROR” matter on a JBOD when I have ZFS

3warehard drivestoragezfs

I have a FreeBSD 8.x machine running ZFS and with a 3ware 9690SA controller.

The 3ware controller shows an ECC-ERROR with one of the disks:

//host> /c0 show
VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   279.39 GB SAS   0   -            SEAGATE ST3300657SS 
p1    OK             u0   279.39 GB SAS   1   -            SEAGATE ST3300657SS 
p2    OK             u1   931.51 GB SAS   2   -            SEAGATE ST31000640SS
p3    ECC-ERROR      u2   931.51 GB SAS   3   -            SEAGATE ST31000640SS
p4    OK             u3   931.51 GB SAS   4   -            SEAGATE ST31000640SS

/c0 show events shows no ECC errors in it's recent history.

ZFS does not currently detect any errors. zpool status says No known data errors

My question: Is this ECC-ERROR something that I need to be concerned about?

According to the 3ware CLI 9.5.2 Manual, an ECC-ERROR means that the 3ware controller caught a read-error for one or more sectors on this drive. This sometimes occurs when a RAID array is recovering from a failed disk. I believe that ECC-ERRORS can also be detected when the 3ware Controller verifies each disk. None of the drives have failed and thus there was no drive rebuild, so I assume that 3ware discovered a bad sector when it ran it's weekly auto-verify scan of the disks. Is this a safe assumption?

According to our logs, ZFS has not detected any bad sectors on this drive. ZFS can work around read errors — if ZFS detects a bad sector on the drive, it will simply mark that sector as bad and never use it again. From the ZFS perspective one bad sector isn't a big deal, although it might indicate that the drive is starting to go bad.

I can clear the ECC-ERROR errors using tw_cli /c0 rescan, and according to the tw_cli man page "Rescanning the controller will clear the error status if the condition no longer exists". And since ECC errors only occur sometimes when particular disk sectors are read, the ECC-ERROR goes away. Since ZFS has presumably moved that bad sector onto another region of the disk, and marked the bad sector as 'bad', the bad sector will never be read again.

Best Answer

According to docs, in case of single drive, this means that you may have corrupted data or maybe not. ZFS saves checksums of objects and therefore data integrity check is possible. Be sure to have RAID and scheduled integrity checks.