Multiple, replicated DNS providers with Route 53

amazon-route53dns-hosting

I manage a few domains that are fairly simple with only A records, CNAMES, and a few other common record types. One thing I hear about constantly is major outages from various professional DNS providers, e.g. DynDNS, DNSimple, as well as more basic providers such GoDaddy and the like.

While AWS Route 53 has never gone down and has massive replication on their side and they have a 100% uptime SLA, I'm wondering if putting all of our eggs in one basket is just asking for trouble.

Does anyone know of a good way to have Route 53 be either the primary or secondary DNS provider (even though there's really no such thing as primary/secondary in DNS land) and have an alternate provider that can automatically detect and then push pull changes to/from Route 53?

Can anyone list some techniques being used to prevent a single point of failure not in the hardware, but with a single provider generally.

Best Answer

Route 53 does not support zone transfers.

DNS Zone Transfers (AXFR/IXFR) support for Route53 is a hotly asked for feature, and is one that we will consider adding in the future.

https://forums.aws.amazon.com/click.jspa?searchID=6666267&messageID=326081

You could, of course, manage your DNS records in an internal system, and programmatically push the settings to Route 53 via the API as well as to another provider using the other provider's interface, but that's not exactly "automatic."

A failure in Route 53 is always possible, but seems very unlikely. It appears to have been designed in a way that should allow numerous catastrophic faults to develop without actually impacting availability.

The service is centrally managed, but globally distributed, and the 4 name servers assigned to each of your hosted zones are not 4 actual servers, each in a specific place, but actually 4 anycast IP addresses that correspond to more than 4 actual servers across the globe.

Although the architecture details aren't public, anecdotal observations suggest that when you update your DNS records, they are pushed out to all the individual servers almost immediately, so there isn't a reliance on a central database when the individual name servers need to look up data. They have replicated copies. Once data is globally propagated, it's available in the edge locations for queries, and the centralized management infrastructure could be completely disabled without impacting the ability to serve queries.

Note also that each of your hosted zones will never have more than any 2 assigned name servers in common with any if your other hosted zones, so even an outage that coincidentally impacted all of the servers for one of your zones should not impact more than one of your zones.

The fact that anycast is used should also imply that if a serious outage takes a location in the edge network completely offline, that edge's route announcements would disappear, causing traffic to automatically route to a different edge where the same IP addresses are being announced.

You'll notice that your 4 name server names are distributed among 4 top-level domains (.com, .net, .org, .co.uk) which further mitigates against global DNS issues that could also break things. Route 53 appears to have been engineered to accommodate even potential problems that would not even have been under Route 53's control if they did occur.

Outages are never truly impossible, but my assessment of Route 53's design (based on what I can observe externally) suggests that service outages in Route 53 are so unlikely as to make failover services unnecessary, and, arguably, adding an alternate service could potentially reduce reliability, if the alternate service were not as resilient as Route 53 appears to be.

The AWS Architecture Blog's Route 53 category gives some interesting insights on the resilience designed into Route 53.

Related Topic