Linux – How to make a Linux software RAID1 detect disc corruption

corruptionlinuxraid1software-raid

This is one of the nightmare days: A virtualized server running on a Linux SW-RAID1 runs a VM that exhibits random segfaults in seemingly random codechunks.

While debugging I find that a file gives different md5sums on each and every run. Digging deeper I find this: The raw disc partitions that make up the RAID1 mirror contain 2 bit-differences and ca. 9 sectors are completely empty on one disc and filled with data on the other disc.

Obviously Linux gives back a sector from a undeterministically chosen disc of the mirror set. So sometimes the same sector is returned OK, sometimes the corrupted is given back.

The docs say:

RAID cannot and is not supposed to guard against data corruption on the media. Therefore, it doesn't make any sense either, to purposely corrupt data (using dd for example) on a disk to see how the RAID system will handle that. It is most likely (unless you corrupt the RAID superblock) that the RAID layer will never find out about the corruption, but your filesystem on the RAID device will be corrupted.

Thanks. That will help me sleep. :-/

Is there a way to have Linux at least detect this corruption by using sector checksumming or something like that? Would this be detected in a RAID5 setup? Is this the moment I wish I used ZFS or btrfs (once it becomes usable without uber-admin capabilities)?

Edit: I am not alone.

Best Answer

You can force a check of (eg) md0 with

echo "check" > /sys/block/md0/md/sync_action

You can check the state of the test with

cat /sys/block/md0/md/sync_action

while it returns check the check is running, once it returns idle you can do a

cat /sys/block/$dev/md/mismatch_cnt

to see if the mismatch count is zero or not. Many distros automate this check to run eg weekly for you anyway, just as most industrial hardware RAIDs continually run this in the background (they often call it "RAID scrubbing") while the array is otherwise idle. Note that according to the comments in fedora's automated check file, RAID1 writes in the kernel are unbuffered and therefore mismatch counts can be non-zero even for a healthy array if the array is mounted.

So quiescing the arrays by doing this check while the VM is down, if at all possible, is probably a good idea.

I'd add that I agree with the docs when they say that

RAID cannot and is not supposed to guard against data corruption on the media

RAID is supposed to guard against complete failure of a device; guarding against incremental random failures in elements of a storage device is a job for error-checking and block-remapping, which is probably best done in the controller itself. I'm happy that the docs warn people of the limitations of RAID, especially if it's implemented on top of flaky devices. I find that frequent smartctl health checks of my drives help me to stay on top of drives which are starting to show the sort of errors that lead to out-of-sync mirrors.