Spamassassin rule to compare return-path and from fields

postfixspamassassin

Recently I noticed a recurring pattern in some of the spams I get.
The return-path and from headers always have the same structure.

Let me explain with an example:

Return-path: <brian_chambers-USER=DOMAIN.COM@readyseas.com>
From: <brian_chambers@readyseas.com>
To: <USER@DOMAIN.COM>

Basically, I'd like to check if the Return-path user part equals the From user part with the addition of the To ("@" changed to "=") and a dash in front of the To.

I wanted to use some Postfix header_checks and reject the USER=DOMAIN.COM@ pattern but most of the legit newsletters I receive contain that too in their return-path (except there's a much more complex string before, which never matches the From field).

Has anybody created such a rule before and cares to share?

Thanks!

Best Answer

SpamAssassin does not let you write code or assign variables ... in its rules. To do what you want, you'd be best suited writing a custom plugin (which would give you full access to perl).

That said, you can technically do what you're asking for within the SpamAssassin rule writing syntax by using header type ALL (which examines all headers at once, kind of like rawbody rules):

header RPATH_EMBEDS_TO_ADDR  ALL =~ /\bReturn-Path:[^\r\n]{0,99}-([\w.])=([\w.-]{1,99}\.[a-z]{2,8})\@(?:[^\r\n]{0,99}[\r\n]{1,9}){1,30}To:[^\r\n]{0,99}<\1@\2>/ism

The above rule is expensive, and would be even more expensive if you were to allow for dashes in the username since it would have to iterate over all possible lengths of ([\w.-]) for the username. This is expensive not just because it requires lots of backtracking, but also because it requires examining very long strings. Also, it is possible that the Return-Path header is after the To header, meaning that you'd need a second rule for a second regex to handle that case.

You'd be far better off writing a custom SpamAssassin plugin for this technique.

However, I think you'll quickly find that all this does is target certain types of bulk mail, many of which are legitimate; the Return-Path header is used as a bounce address and many mailing lists encode the recipient into it in order to measure their deliverability and clean up their lists.

If you really want this sort of thing, I suspect it doesn't actually matter if the exact To address is the one present in the Return-Path header. Here is a dramatically faster rule that should have nearly the same efficacy:

header RPATH_EMBEDS_ADDR  Return-Path =~ /-[\w.]{1,99}=[\w.-]{1,99}\.[a-z]{2,8}\@/i

Another big note is that whenever a message is redirected (e.g. an email forwarding service), the Return-Path header is rewritten. This may limit the spam detection utility of that rule.