Sending email is slow to some domains

centos6php5sendmail

My system sends email from php using sendmail. I realize that there are dozens of questions on stack exchange about delays sending email from sendmail/php. I have read them all but I still can't solve my problem.

I have a production server and a development server. Production is in a data center and uses a 'smart host' to relay its emails, direct sending of emails is blocked by a firewall. Development is in our office and sends emails directly. They also use different DNS servers. Other than this, these servers are as similar as I could possibly make them. My system needs to send many (around 10,000) emails at a time. My development server is able to send all emails in well under an hour. The production system takes more than 24 hours. The emails are 2kb-5kb in size.

On my production server:

I have determined that the hand-off from php to sendmail is fast. Sendmail is slow at sending emails to some domains. The big email services (gmail, aol, yahoo) are fast. I have tried to send 1000 emails to only aol addresses and they all sent within a couple minutes, so I know the relay is capable of sending emails quickly. My problem is that most of the email addresses in my address list are to small domains, and most of those are slow. This leads me to believe the slow step involves DNS. Here is an excerpt of my maillog file (values in {curly brackets} replaced for privacy).

Mar 28 11:21:47 {servernamereplaced} sendmail[26242]: v2SILe8w026242: to={usernamereplaced@domain1replaced.net}, ctladdr={customfromaddress@notmydomain.com} (48/48), delay=00:00:07, xdelay=00:00:00, mailer=relay, pri=32561, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (v2SILlIu026244 Message accepted for delivery) 
Mar 28 11:22:08 {servernamereplaced} sendmail[26247]: v2SILmT8026247: to={usernamereplaced@domain2replaced.net}, ctladdr={customfromaddress@notmydomain.com} (48/48), delay=00:00:20, xdelay=00:00:10, mailer=relay, pri=32550, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (v2SILwOh026248 Message accepted for delivery).

You can see there are long delays on these. However, when I run host -t mx domain1replaced.net I get a result very fast (less than half a second, and exactly as long as gmail.com or aol.com).

I've seen some answers mentioning the timeout values in sendmail.cf, however I don't know enough about that file to mess around on my own. Also, the delay values aren't all the same or multiples of the same number. They seem to be pretty scattered between 5 and 30 seconds. The thing that gets me is that my development server sends them so fast, but I can't get my production server to do the same thing. What can I do to speed up my production server?

Best Answer

Lets start analyzing sendmail's log, especially delay and xdelay. From here:

delay: The total message delay: the time difference between reception and final delivery or bounce). Format is delay=HH:MM::SS for a delay of less than one day and delay=days+HH:MM::SS otherwise

xdelay: The total time the message took to be transmitted during final delivery. This differs from the delay= equate, in that the xdelay= equate only counts the time in the actual final delivery.

To better emphasize the difference between delay/xdelay (taken from here):

This differs from delay= in that delay= shows the total amount of time the message took, computed from when the message was originally received or queued (this could be days ago), until it was eventually delivered. In the case of SMTP mail, the xdelay= computation starts when sendmail starts trying to connect to the remote host.

Your logs shows the following delay/xdelay:

delay=00:00:07, xdelay=00:00:00: this message waited for 7 seconds in sendmail's queue before smtp transfer was even being processed. This is a sign of an overloaded server, rather than connection/DNS problems: after all, the actual trasfer was executed almost instantly;

delay=00:00:20, xdelay=00:00:10: this time, the message stalled in sendmailìs queue for 10 seconds, and the actual transfer tool another 10 seconds. Based on your server's reputation, this can be a very reasonable value: many mailserver delay HELO/EHLO by 5-30 seconds for unknow of neutral senders.

So, why the test machine is so much faster than your production machine? Here are some possibilities:

  • sending through a smart host is much simpler and faster than sending email directly. When sending from your dev server, are you sure than email are actually received in about one hour, or this is only the time your dev server take to pass all mails to the smarthost?
  • maybe your smarthost has a good reputation and remote servers take its mails without forcing artificial waiting;
  • sending many emails is a very fsync() intensive work. Are your dev and production servers really the same, hardware and software wide? What about the smarthost's hardware and software stack?
Related Topic