Not receiving mails from some senders due to DNS config

cname-recorddomain-name-systememailg-suite

I have noticed a peculiar behavior of my google apps domain. Most of the mails come through as you would expect, but over a period of time I have come to the conclusion that mails from certain senders don't come through. After identifying one such sender, whose mails wouldn't come through, I have asked him to try to send me an email and forward the "delivery failure"-response to my regular gmail.

The delivery failure response contained the following snippet:

—– Transcript of session follows —–
<myusername@GHS.L.GOOGLE.COM>… Deferred: Connection timed out with ghs.l.google.com.

This helped me to identify the problem by doing a quick search which led me to this page on Google Apps Help Forum. Indeed, I checked the DNS record for my domain, and @ was set to ghs.google.com. (CNAME), which it shouldn't be. Changing that to @ 74.125.93.121 (A)* resolved the problem.

I understand that in the cases where the mail wouldn't come through, my domain name was substituted by it's canonical name through a CNAME lookup, so the mail was sent to myusername@ghs.l.google.com instead of myusername@mydomain.com. But why did it work for the vast majority of senders? Did the senders whose mail wouldn't come through, use some different kind of mail protocol, some weird DNS settings, or what could it be?

From what I could see by researching the problem on google, this seems to be a wide-spread issue (lots of people complaining about emails from battle.net not coming through, would be one popular example), only that people don't seem to be aware that the problem lies in their own DNS settings, rather then at the senders' side.

So how can this be explained?

* I used this IP because of what I read here, but I think any IP would do the trick. Can anyone confirm this? Note that simply removing the @ record did not resolve the problem, it had to be changed.

Best Answer

From RFC 2821 "Simple Mail Transfer Protocol", section 5 "Address Resolution and Mail Handling":

The lookup first attempts to locate an MX record associated with the name. If a CNAME record is found instead, the resulting name is processed as if it were the initial name.

In general, this is how CNAMEs work. They are often mis-used, mis-understood, and mis-implemented. :-)

If your domain is example.com, you probably have existing MX records pointing to the usual Google Apps hosts.

example.com. MX 10 ASPMX.L.GOOGLE.COM.
example.com. MX 20 ALT1.ASPMX.L.GOOGLE.COM.
example.com. MX 20 ALT2.ASPMX.L.GOOGLE.COM.
example.com. MX 30 ASPMX2.GOOGLEMAIL.COM.
example.com. MX 30 ASPMX3.GOOGLMAILE.COM.
example.com. MX 30 ASPMX4.GOOGLEMAIL.COM.
example.com. MX 30 ASPMX5.GOOGLEMAIL.COM.

It sounds like you also had an entry like this:

example.com. CNAME ghs.l.google.com.

RFC 1034 "Domain Concepts and Facilities" states in section 3.6.2 "Aliases and canonical names" recommends against this configuration:

If a CNAME RR is present at a node, no other data should be present; this ensures that the data for a canonical name and its aliases cannot be different.

In the case of the error you pasted, the mail server and/or DNS server on the sending end attempted to look up MX record(s) for your domain, example.com, and found a CNAME pointing to ghs.l.google.com. It then tried to look up the MX record(s) for ghs.l.google.com. That domain does not currently have any MX records, so the mail server would have fallen through to the A record for ghs.l.google.com. That IP address was not listening on the SMTP port, so the result is the error "Connection timed out with ghs.l.google.com."

By removing the CNAME record, you've fixed your mail problems. You might encounter issues if the IP address you've defined in its place is changed on Google's end.

You could instead define the cname for www.example.com:

www.example.com. CNAME ghs.l.google.com.

And run a small webserver on whatever IP you point example.com at, which simply does an HTTP redirect to http://www.example.com/

It's somewhat surprising that it worked as well as it did. Postel's law gets some credit there, I believe. :-)

Back to RFC 1034 2.6.2:

CNAME RRs cause special action in DNS software. When a name server fails to find a desired RR in the resource set associated with the domain name, it checks to see if the resource set consists of a CNAME record with a matching class. If so, the name server includes the CNAME record in the response and restarts the query at the domain name specified in the data field of the CNAME record. The one exception to this rule is that queries which match the CNAME type are not restarted.

So, in this case it could be argued that the DNS server would/should not follow the CNAME on an MX lookup unless there were no MX records found.

When sending mail, Sendmail and qmail (and likely others) will by default attempt to rewrite any CNAME used in the right hand side of an email address to the canonical name.

Indeed, some sites relied on this behavior. djb goes into some detail on why he thinks people should stop relying on it in his "CNAME records in mail" document.