BTrFS crashhhh

btrfs

I create a new BTrFS raid10 file system using two 250GB drives and the second partition on a third 80GB drive. I create a subvol and snapshot. I mount the snapshot and start copying 8GB of data to it. It gets to around 1GB and the Desktop disappears and what looks like a non interactive terminal comes up with dump/crash information. I don't have a camera handy or I'd take a picture and post it. It basically looks like stack trace info. CTRL-ALT F7 will eventually bring back the Desktop though but the entire BTrFS portion of the OS is hung and non responsive until I reboot.

I've reformated and reproduced this problem 3 times now and I'm about to give up 🙁

I realize it is possible this problem is not entirely BTrFS' fault because I'm on natty which is still alpha.

More granular details in case I'm an idiot:

1) Create FS:
sudo mkfs.btrfs -m raid10 -d raid10 /dev/sda2 /dev/sdb /dev/sdc

2) Initial temporary mount:
mkdir /btrfs && sudo mount -t btrfs /dev/sda2 /btrfs

3) Create subvol
btrfs s c /btrfs/vm

4) Create initial snapshot: (optional)
btrfs s sn /btrfs/cantremember.snap.something

5)unmount /btrfs and mount /btrfs/vm
sudo mount -t btrfs -o subvol=vm /dev/sda2 /btrfs/vm

6) Copy data to subvolume.
7) Balance data across drives: (optional)
btrfs f bal <path>

(never get to this step 7…)
Am I doing something wrong?

EDIT: I managed to catch the tail end of the backtrace / crash info:

kernel BUG at /build/buildd/linux-2.6.38/fs/btrfs/extent-tree.c:8581

EDIT2: Removing the smallest (46GB) partition from the raid10 array seems to have eliminated the problem.

Best Answer

From the sounds of it, you're running into this:

Allocation is done on a round-robin basis. If you have a raid1 strategy on a volume made up of mismatched drives (volumes of differing sizes), your smaller volume may fill up while leaving lots of space free on your single largest drive. You can verify that this is an issue if there is any discrepancy between 'df' and 'btrfs filesystem df [mountpoint]' AND if the latter command also shows that "total" and "used" are the same on the "Data" line. A rebalance may mitigate this problem. (2.6.33)

  • If your volume does fill up in this manner, a rebalance may quickly cause an ENOSPC ("Error NO SPaCe left on device") oops. You may have to delete a relatively large file to resolve this impasse, then a rebalance will succeed. (2.6.33)

https://btrfs.wiki.kernel.org/index.php/Gotchas

Emphassis mine. The second partition on a third 80GB drive you mention is probably filling up well before the pair of 250GB drives are, and it's triggering this particular Gotcha.

Also, BTrFS is a beta filesystem for a reason.