Automatically rescan email with SpamAssassin after it has been received

eximspamspamassassin

For the past months, the amount of spam I've been receiving has been driving me crazy. Despite running SpamAssassin (with RBL checks) on my Exim4 mailserver, a lot of it has been finding its way to our mailboxes.

I've noticed that the vast majority of the offenders end up in the RBLs, but only after it has already been scanned and found harmless. Typically the spam score of these mails is 0.0 – 1.1 when I receive them, while some time later the score would be much higher.

I did some searching, but couldn't find anything that seemed usable, so I threw together a little program that goes through the most recent mails in my mailbox and lets SpamAssassin reanalyze them. The results were staggering: almost every mail that ended up in my mailbox passed the 5.0 threshold 5 to 10 minutes after delivery. Sometimes it takes a bit longer than that, but so far it will always pass the threshold eventually.

Now, that isn't all that useful when you're actually actively reading the mailbox, but it would significantly reduce the time we spend manually deleting these mails in the morning when we first check our email.

Now, the problem is that the program I made runs separately and it uses IMAP to connect to a specific mailbox and make the changes. This makes it difficult for me to use this solution for other users, because I would have to have their passwords.

Are there any existing tools that allow me to reprocess mail that has already been received? I'm using the Maildir format on disk if that matters. If no such tool exists, a library to directly access and modify Maildir mailboxes might also do the trick.

I do not want to delay delivery of email (by greylisting or otherwise) because that delay would have to be at least 10 minutes to be effective, which would be prohibitive during business hours.

Best Answer

This is indeed a very good technique, especially for fighting snowshoe, a type of spam where the entire email blast is out the door in a matter of minutes. This is because anti-spam servers take about that long to process anything that arrives and then pump out their spam definitions.

I don't know of any ready-to-use software that can do this locally, but IMAP Spam Begone may suit your needs. It connects to your mailbox server via IMAP (the way a standard mail client would) and runs SpamAssassin to clean it up for you.


If you wanted something that runs locally, you could probably write a simple wrapper around SpamAssassin that does this. Maildir stores each message in its own file, so something like this should be decent:

Contents of sa-bootstrap.sh:

#!/bin/sh
for email in "$@"; do
  if ! spamassassin -e < "$email" > /dev/null 2>&1; then
    mv "$email" /full/path/to/spam/folder
  fi
done

Now you can run:

find /path/to/maildir -type f -print0 |xargs -0 sa-bootstrap.sh

Don't forget to verify your spam and to use sa-learn on your spam and ham before deleting them.

(spamassassin -e will exit with a non-zero error code when the given message is determined to be spam.)