Postfix Mail Filter Stops Delivering Mail

amavisdiagnosticemailpostfixtroubleshooting

I have an internet facing postfix mail filter (debian lenny) that sits in front of all our other mailservers on our network, and scans our mail using amavisd-new, clamav, spamassassin, and policy-weightd.

This server was set up and configured using the document found here: http://www200.pair.com/mecham/spam/spamfilter20090215.html
(I also set up the bayesean and AWL lists with MySQL, and installed policy-weightd as described on the same site)

These servers (I have 2) have been running great for a couple of years now (on Debian Etch), but this latest install locks up about once per day (at different times) somehow, and I can't figure out why.

Details of problem

  1. The mail queues up on the server, and running mailq lists a bunch of items with (delivery temporarily suspended: conversation with 127.0.0.1[127.0.0.1] timed out while receiving the initial server greeting)
  2. Running amavisd-nanny freezes, and I have to log out of the ssh session. Running amavisd-nanny on a working system will show me the state of each amavisd process, and occasionally find stuck processes (what causes these stuck proccesses??) and terminate them. (I have set up a cron job to run amavisd-nanny hourly to clear these stuck processes, however, even that isn't enough to keep things running)
  3. ps -ef|grep amavisd lists all of my amavisd processes (12 of them) with (ch#-accept) after them. On a working system these say either (virgin child) or (ch#-avail)
  4. Memory, Diskspace, or number of postfix processes do not appear to be the problem.

What should I be doing to further diagnose my problem? I am not looking for a workaround, I want to determine what is going wrong and fix it.

Best Answer

strace one of the stuck processes. It might give you a hint what's wrong.

strace -p PROCESSIDOFSTUCKPROCESS

Cheers