How Do I Troubleshoot Exim Mail Retry Issues

emailemail-bounceseximperformance-tuningsmtp

My server is encountering high CPU load (like almost pegged at 100%) so much that the Apache service can't run and we get Apache 500 errors. We used a script to catch this and this is where we discovered that normally the server doesn't run a bunch of processes that look like "/usr/sbin/exim -Mc 1R6Nvz-0006CN-KI". However, when the problem occurs, consistently we find a bunch of processes in memory that say "/usr/sbin/exim -Mc 1R6Nvz-0006CN-KI". We contacted HostGator support and they said indeed that the cause of the problem is Exim Mail Retries (what the -Mc switch is for) and not Apache or MySQL or any other process. They agree with my conclusion on focusing purely on Exim.

HostGator is going to grant me root access today to this dedicated host. I'm brand new to Exim, but know Linux fairly well. What logs, email directories, and Exim config files would you recommend I look at in order to troubleshoot high Exim Mail Retries? Note that this is a CentOS 5 Linux with WHM/cPanel on it.

For instance, things I'd love to see:

  • the log file on Exim activity, both success and error
  • would like to crack open one of the emails it's trying to retry, in order to see a clue perhaps
  • would love to see the Exim config files to see if there's a throttle we can apply so that we don't do all these Exim mail retries at once, but perhaps over a large period of time

Best Answer

Start by running the mailq and exiwhat commands to get a handle on what is happening. mailq will show what is queued. exiwhat will tell you what the running processes are doing.

Log files for Exim include mainlog (all messagess), rejectlog (reject messages with more details), and paniclog (failures, rare). They should be under /var/log possibly in /var/log/exim or /var/log/exim4.

Messages in the queue can be found in Exim's spool directory, likely /var/spool/exim4 or such. Messages are in the input directory, and status information is in the corresponding msglog directory.

If you are getting a lot of retries something is not configured correctly. My rant on Running an Email Server will give you an idea what is required, and my posting on Detecting Email Server Forgery may give you an idea why your queue is growing. It has been my experience that mail servers for web sites tend to be poorly configured. Take the time to get the configuration right, it isn't that difficult.

A few variables which may help you include:

  • deliver_queue_load_max - disables messages delivery from queue
  • queue_only_load - forces queuing when system is loaded
  • queue_run_max - limits number of queue processes

See the Exim Specification section 14 for more details.