BTRFS unmountable after cold reboot (total_rw_bytes is twice too big)

btrfsdisaster-recoveryfilesystems

One of my users in research environment invoked out-of-memory on a server which mounts a 52TB btrfs partition. I had to power cycle the server.
After the reboot my btrfs partition cannot be mounted in read-write mode.

mount /mnt/storage/
mount: /mnt/storage: wrong fs type, bad option, bad superblock on /dev/mapper/fc_trunk-part3, missing codepage or helper program, or other error.

Kernel logs show a problem with device size:

Mar 19 15:10:52 mamut kernel: BTRFS error (device dm-5): open_ctree failed
Mar 19 15:10:52 mamut kernel: BTRFS info (device dm-5): use lzo compression, level 0
Mar 19 15:10:52 mamut kernel: BTRFS info (device dm-5): disk space caching is enabled
Mar 19 15:10:52 mamut kernel: BTRFS info (device dm-5): has skinny extents
Mar 19 15:10:52 mamut systemd[1]: mnt-storage.mount: Mount process exited, code=killed, status=15/TERM
Mar 19 15:10:52 mamut systemd[1]: mnt-storage.mount: Failed with result 'timeout'.
Mar 19 15:10:52 mamut systemd[1]: Failed to mount /mnt/storage.
Mar 19 15:10:52 mamut kernel: BTRFS error (device dm-5): super_total_bytes 52798547820544 mismatch with fs_devices total_rw_bytes 105597095641088
Mar 19 15:10:52 mamut kernel: BTRFS error (device dm-5): failed to read chunk tree: -22
Mar 19 15:10:52 mamut kernel: BTRFS error (device dm-5): open_ctree failed
[...]
Mar 19 15:15:52 mamut systemd-helper[9798]: IO Error (subvolume is not a btrfs subvolume).
Mar 19 15:15:52 mamut systemd-helper[9798]: number cleanup for 'storage' failed.
Mar 19 15:15:52 mamut systemd-helper[9798]: running timeline cleanup for 'storage'.
Mar 19 15:15:52 mamut systemd-helper[9798]: IO Error (subvolume is not a btrfs subvolume).
Mar 19 15:15:52 mamut systemd-helper[9798]: timeline cleanup for 'storage' failed.
Mar 19 15:15:52 mamut systemd-helper[9798]: running empty-pre-post cleanup for 'storage'.
Mar 19 15:15:52 mamut systemd-helper[9798]: IO Error (subvolume is not a btrfs subvolume).
Mar 19 15:15:52 mamut systemd-helper[9798]: empty-pre-post cleanup for storage failed.
Mar 19 15:15:52 mamut systemd[1]: snapper-cleanup.service: Main process exited, code=exited, status=1/FAILURE
Mar 19 15:15:52 mamut systemd[1]: snapper-cleanup.service: Failed with result 'exit-code'.

The super_total_bytes=52798547820544 is the correct size of the partition in bytes reported by fdisk.
fs_devices total_rw_bytes=105597095641088 is exactly twice of that.

I tried running btrfs check but got this error:

btrfs check /dev/mapper/fc_trunk-part3
Opening filesystem to check...
Checking filesystem on /dev/mapper/fc_trunk-part3
UUID: 40a2e65b-f34a-4d33-946d-055d93fe7ffa
[1/7] checking root items
ERROR: failed to repair root items: Input/output error

Now, I know about btrfs rescue fix-device-size, but I have never ran it before. The man page says:

fix-device-size 
           fix device size and super block total bytes values that are do
           not match

           Kernel 4.11 starts to check the device size more strictly and
           this might mismatch the stored value of total bytes. See the
           exact error message below. Newer kernel will refuse to mount the
           filesystem where the values do not match. This error is not fatal
           and can be fixed. This command will fix the device size values if
           possible.

               BTRFS error (device sdb): super_total_bytes 92017859088384 mismatch with fs_devices total_rw_bytes 92017859094528

           The mismatch may also exhibit as a kernel warning:

               WARNING: CPU: 3 PID: 439 at fs/btrfs/ctree.h:1559 btrfs_update_device+0x1c5/0x1d0 [btrfs]

Kernel version did change after reboot, but both versions are > 4.11 and previously I had no problems mounting this partition.

The partition:

  • is big and will take a lot of time, and space I don't have, to back up
  • has critical data for my research
  • has snapshots
  • it is possible to mount it with -o rescue,ro

Is it safe to call btrfs rescue fix-device-size?

Can I fix it in some other safe way?

Best Answer

"Is it safe to call btrfs rescue fix-device-size?"

It's potentially safe, and this is very likely the solution. It "shouldn't" eat your entire volume and several cats. If this BTRFS filesystem has multiple disks (for example, in a BTRFS RAID), I'm suddenly less confident in this assertion.

If you have a block based snapshot mechanism below BTRFS (it looks like you might - is that an LVM volume backing it?) then take a snapshot prior to doing this. You may need to add more physical volumes to that volume group in order to accommodate the snapshot itself depending on how this volume group (if that's what it is) is already allocated. An LVM snapshot will grow in size as data is modified, proportionate to the amount of data modified. An LVM snapshot will also incur a 2x write performance hit while active, so don't keep it around after you're done. This is just so you can roll back if things go very badly.

If it's really important data, do a block based backup to another totally unrelated volume before doing anything - especially if you're not intimately familiar with LVM snapshots or this isn't on LVM. dd is a good command for that.

dd if=/dev/disk/with-btrfs of=/large/enough/volume/backup.img bs=4M

Related Topic