Bayesian filtering for Exchange 2010

exchange-2010spam-filterspamassassin

So here's the deal.

Basically, I am looking for a convenient way to get spam from my Exchange 2010 mail server to my spam filter/proxy (whatever you want to call it) in a mail format for SpamAssassin to do its Bayesian filtering (Maildir or Mbox apparently).

I have created a gateway that filters mail and then passes it through to my Exchange server, as per this tutorial. From the research I have done it should be easy to apply the Bayesian filtering once the mail is in a format that it knows how to use:

sa-learn -mbox --spam ~/mbox/spam ~/mbox/bad-spam

Essentially, SpamAssassin needs a certain number of SPAM and HAM emails to do its thing and was thinking I could have users dump their emails into this public folder.

My initial inclination was to use something like IMAP2mbox and then dumping it into a directory on the SpamAssassin gateway. Therein lies my problem, it seems that this has become tricky to do with Exchange Server 2010 as there is no longer support for IMAP public folders built in.

I am stumped trying to come up with a way to get spam emails from my Exchange server into a format that SpamAssassin can use.

I am guessing there is a similar way to do this, but I'm not sure where to look next.

Best Answer

Exchange 2010 might allow using the domain/user/mailbox notation for accessing foreign user's mailboxes through IMAP. According to KB937359 this feature was originally removed from Exchange 2007, but re-introduced in SP1 Rollup 4. So it would be worth a try.

There is also DavMail which might be of some help - it gateways standard internet mail protocols through to Exchange over WebDAV or EWS. I have not tried, but accessing other user's mailboxes might work there using the DOMAIN\USERNAME\MAILBOX notation, public folders are apparently accessible as well.

The basic idea how to get the spam mail into SA is to simply set up fetchmail on your Postfix/Amavisd-Box to retrieve it and feed it to sa-learn. Make sure to specify the right database path for sa-learn so your updated bayes database is actually used by amavis. On an Ubunty system the command to do this should look like this:

/usr/bin/fetchmail -a -n -m '/usr/bin/sa-learn --dbpath /var/lib/amavis/.spamassassin' --spam

with your .fetchmailrc containing the necessary information for username, password, mailbox to access and the folder to fetch:

poll your.exchange.server protocol IMAP user "DOMAIN/spamadmin/user1" with password "spamadmin-password" folder "SPAM"
poll your.exchange.server protocol IMAP user "DOMAIN/spamadmin/user2" with password "spamadmin-password" folder "SPAM"
poll your.exchange.server protocol IMAP user "DOMAIN/spamadmin/user3" with password "spamadmin-password" folder "SPAM"

Specifying the -v parameter for the fetchmail command and the -D parameter for sa-learn will give you some debug output. The fetchmail docs contain more useful information and some examples for a working fetchmail configuration.