Storage – How to Resolve Btrfs Checksum Errors After Disk Replacement

btrfssmartstorage

I had a pair of 3TB disks in a btrfs raid1 array.

One of these disks started failing (smartd shows bad sectors), and so I bought a pair of new 8TB drives to replace both disks in the array.

I replaced both with btrfs replace, and ran a btrfs balance afterwards – which fails on the following message:

[ 5063.136378] BTRFS error (device sdc): parent transid verify failed on 5153170751488 wanted 1433374 found 1417912
[ 5063.140428] BTRFS error (device sdc): parent transid verify failed on 5153170751488 wanted 1433374 found 1417912

Now, I've seen these messages precisely before replacing the disks, but now since both disks have been replaced I believe it has something to do with btrfs.

My data is fully backed up and the filesystem is online and working properly, but I cannot perform a balance due to this error. Running a scrub produces a small amount of uncorrectable errors, just as it did before I replaced the disks.

I was wondering how I could, perhaps:

  1. Find out which files are corrupted and restore them from a backup
  2. Reset the transaction on the filesystem to remove the errors
  3. Ignore the errors while balancing

…or any other reasonable solution.

Thanks!

Best Answer

I've made a few extra attempts to solve this and eventually only a clean filesystem reformat solved the issue.

Once I transefered the data out of the disks I tried two dangerous commands - btrfs check --init-csum-tree and a btrfs check --repair - neither of which did any harm but did not solve the issue.

After reformatting, I transferred the data back on the filesystem again, ran a btrfs filesystem balance and a btrfs filesystem scrub, and now everything is working again.

Cheers!

Related Topic