“No space left on device” error despite having plenty of space, on btrfs

disk-space-utilizationfilesystemshard drive

Almost everywhere I'm getting failures in logs complaining about No space left on device

Gitlab logs:

==> /var/log/gitlab/nginx/current <==
2016-11-29_20:26:51.61394 2016/11/29 20:26:51 [emerg] 4871#0: open() "/var/opt/gitlab/nginx/nginx.pid" failed (28: No space left on device)

Dovecot email logs:

Nov 29 20:28:32 aws-management dovecot: imap(email@www.sitename.com): Error: open(/home/vmail/emailuser/Maildir/dovecot-uidlist.lock) failed: No space left on device

Output of df -Th

Filesystem     Type      Size  Used Avail Use% Mounted on
/dev/xvda1     ext4      7.8G  3.9G  3.8G  51% /
devtmpfs       devtmpfs  1.9G   28K  1.9G   1% /dev
tmpfs          tmpfs     1.9G   12K  1.9G   1% /dev/shm
/dev/xvdh      btrfs      20G   13G  7.9G  61% /mnt/durable
/dev/xvdh      btrfs      20G   13G  7.9G  61% /home
/dev/xvdh      btrfs      20G   13G  7.9G  61% /opt/gitlab
/dev/xvdh      btrfs      20G   13G  7.9G  61% /var/opt/gitlab
/dev/xvdh      btrfs      20G   13G  7.9G  61% /var/cache/salt

Looks like there is also plenty of inode space. Output of df -i

Filesystem     Inodes  IUsed  IFree IUse% Mounted on
/dev/xvda1     524288 105031 419257   21% /
devtmpfs       475308    439 474869    1% /dev
tmpfs          480258      4 480254    1% /dev/shm
/dev/xvdh           0      0      0     - /mnt/durable
/dev/xvdh           0      0      0     - /home
/dev/xvdh           0      0      0     - /opt/gitlab
/dev/xvdh           0      0      0     - /var/opt/gitlab
/dev/xvdh           0      0      0     - /var/cache/salt

Output of btrfs fi show

Label: none  uuid: 6546c241-e57e-4a3f-bf43-fa933a3b29f9
        Total devices 4 FS bytes used 11.86GiB
        devid    1 size 10.00GiB used 10.00GiB path /dev/xvdh
        devid    2 size 10.00GiB used 9.98GiB path /dev/xvdi
        devid    3 size 10.00GiB used 9.98GiB path /dev/xvdj
        devid    4 size 10.00GiB used 9.98GiB path /dev/xvdk

Output of btrfs fi df /mnt/durable

Data, RAID10: total=17.95GiB, used=10.12GiB
Data, single: total=8.00MiB, used=0.00
System, RAID10: total=16.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, RAID10: total=2.00GiB, used=1.74GiB
Metadata, single: total=8.00MiB, used=0.00
unknown, single: total=272.00MiB, used=8.39MiB

What could be the cause of this? I'm using a base linux AMI ec2 kernal version 4.4.5-15.26.amzn1.x86_64

Update

Running the command suggested below btrfs fi balance start -dusage=5 /mnt/durable gave me back an error of the following:

ERROR: error during balancing '/mnt/durable' - No space left on device
There may be more info in syslog - try dmesg | tail

After manually deleting a bunch of larger files totaling to ~1GB I rebooted the machine and tried again, making sure I was using sudo, and the command executed. I then rebooted my machine once again for good measure and it seems to have solved the problem

Best Answer

Welcome to the world of BTRFS. It has some tantalizing features but also some infuriating issues.

First off, some info on your setup, it looks like you have four drives in a BTRFS "raid 10" volume (so all data is stored twice on different disks). This BTRFS volume is then carved up into subvolumes on different mount points. The subvolumes share a pool of disk space but have separate inode numbers and can be mounted in different places.

BTRFS allocates space in "chunks", a chunk is allocated to a specific class of either data or metadata. What can happen (and looks like has happened in your case) is that all free space gets allocated to data chunks leaving no room for metadata

It also seems that (for reasons I don't fully understand) that BTRFs "runs out" of metadata space before the indicator of the proportion of metadata space used reaches 100%.

This appears to be what has happened in your case, there is lots of free data space but no free space that has not been allocated to chunks and insufficient free space in the existing metadata chunks.

The fix is to run a "rebalance". This will move data around so that some chunks can be returned to the "global" free pool where they can be reallocated as metadata chunks

btrfs fi balance start -dusage=5 /mnt/durable

The number after -dusage sets how aggressive the rebalance is, that is how close to empty the blocks have to be to get rewritten. If the balance says it rewrote 0 blocks try again with a higher value of -dusage.

If the balance fails then I would try rebooting and/or freeing up some space by removing files.