Why isn’t disk space reduced after deleting mails through IMAP with Dovecot

dovecotimap

I'm running an email server with OpenSMTPD and Dovecot on Linux and accessing emails using IMAP with a Thunderbird client. When I delete an email in Thunderbird, why doesn't disk space usage go down?

As an example, one user's mbox files are stored in /var/vmail/${domain}/$[user}/:

$ ls
Archives  Drafts  inbox  Sent  Spam  TrainSpam  Trash

I'm not sure if mbox files are sparse files, so instead of du (which also shows the issue), I expect ls will have the most accurate "effective" file sizes, so I added up all sizes of all files in this directory:

$ ls -al | grep vmail | awk '{print $5}' | paste -sd+ | bc
1119217444

Next, I go to Thunderbird, and delete a large email with attachment which shows a size of 1MB. Thunderbird sends it to the Deleted folder, then I go to the Deleted folder, delete it there, confirm the permanent deletion dialog and re-count the file sizes:

$ ls -al | grep vmail | awk '{print $5}' | paste -sd+ | bc
1119217443

So it went down 1 byte. Perhaps it's just marking it deleted? How to I actually get back the disk space? I understand this may be non-trivial since an mbox file is just a huge, flat file.

Best Answer

In MBOX format messages are stored in a single huge file one below the other, with a very simple structure:

From envelope-sender@example.com  Sat Nov 10 06:00:00 2018
From: Author <author@example.com>
To: Recipient <recipient@example.com>
Subject: Sample message 1

Message body.
>From is escaped. Otherwise it would break the MBOX file.

From envelope-sender@example.net  Sat Nov 10 06:30:00 2018
From: Author <author@example.net>
To: Recipient <recipient@example.com>
Subject: Sample message 2

Another message body.

Therefore, deleting a message from the middle of the file will cause rewriting the rest of the file, which might be bad for both performance and data integrity, as the file might became corrupted if the write is interrupted.

One solution is to flag the message as deleted rather than actually deleting it, as it requires only modifying one line while keeping rest of the file intact. This allows combining multiple deletions into a single operation, later.

MozillaZine's article on Compacting folders explains this from the Thunderbird perspective:

When you delete messages in an email client such as Thunderbird they aren't physically deleted. Even emptying the Trash does not get rid of them. Instead they are marked for deletion and hidden from view. They are not physically removed until you "compact" the folder. This is a tradeoff done to improve performance in large folders.

Dovecot's article on Mbox Mailbox Format explains how Dovecot handles the problems with MBOX format. The deletion is stored in X-Status: D header added to the message headers.

Dovecot uses C-Client (ie. UW-IMAP, Pine) compatible headers in mbox messages to store metadata. These headers are:

  • X-IMAPbase: Contains UIDVALIDITY, last used UID and list of used keywords
  • X-IMAP: Same as X-IMAPbase but also specifies that the message is a "pseudo message"
  • X-UID: Message's allocated UID
  • Status: R (\Seen) and O (non-\Recent) flags
  • X-Status: A (\Answered), F (\Flagged), T (\Draft) and D (\Deleted) flags
  • X-Keywords: Message's keywords
  • Content-Length: Length of the message body in bytes

Whenever any of these headers exist, Dovecot treats them as its own private metadata. It does sanity checks for them, so the headers may also be modified or removed completely. None of these headers are sent to IMAP/POP3 clients when they read the mail.

Related Topic