Web-server – How do we create fail-over DNS for web services

dns-hostingdomain-name-systemweb-applicationsweb-hostingweb-server

I am trying to setup a completely failsafe DNS system. Our web application serving infrastructure is sound with multiple failover servers but DNS is a weak-point for us. We currently tell our domain registrar to use our hosting providers name-servers. From there I utilise a virtual DNS server to forward to the relevant web-server/load-balancer.

However, my problem is that if the domain registrars domain servers were to go down then surely we would lose our site as well? Am I correct on this assumption? If so how would we make this fail-safe. I have researched the use of managed DNS providers which provide multiple fail-over DNS servers but even if we use this does this still not make the domain name registrar the weak point in the chain?

Thanks for any assistance.

Best Answer

I'm very much as "do it yourself" guy when it comes to almost all tech, including hosting. But I don't host my own DNS because it's so critically important and commercial providers are extremely cheap.

All my zones are hosted at ZoneEdit. Each of my zones has at least two US-based DNS servers (the min. required), but a couple of my more important zones also have a third server located in a separate network in Germany. I could add additional servers if I felt it was necessary. Total cost for this? About $20/year/zone.

Edit: The concern about a registrar's servers going down is understandable but unwarranted. The hierarchical nature of DNS means that your site will continue to work even if they go offline. The root servers are at the top of the hierarchy and are the only part of DNS that must remain operational for everything to work.

Related Solutions

DNS Round Robin – Is It the Only Way to Ensure Instant Fail-Over for Multiple Data Centers?

When I use the term "DNS Round Robin" I generally mean in in the sense of the "cheap load balancing technique" as OP describes it.

But that's not the only way DNS can be used for global high availability. Most of the time, it's just hard for people with different (technology) backgrounds to communicate well.

The best load balancing technique (if money is not a problem) is generally considered to be:

A Anycast'ed global network of 'intelligent' DNS servers,
and a set of globally spread out datacenters,
where each DNS node implements Split Horizon DNS,
and monitoring of availability and traffic flows are available to the 'intelligent' DNS nodes in some fashion,
so that the user DNS request flows to the nearest DNS server via IP Anycast,
and this DNS server hands out a low-TTL A Record / set of A Records for the nearest / best datacenter for this end user via 'intelligent' split horizon DNS.

Using anycast for DNS is generally fine, because DNS responses are stateless and almost extremely short. So if the BGP routes change it's highly unlikely to interrupt a DNS query.

Anycast is less suited for the longer and stateful HTTP conversations, thus this system uses split horizon DNS. A HTTP session between a client and server is kept to one datacenter; it generally cannot fail over to another datacenter without breaking the session.

As I indicated with "set of A Records" what I would call 'DNS Round Robin' can be used together with the setup above. It is typically used to spread the traffic load over multiple highly available load balancers in each datacenter (so that you can get better redundancy, use smaller/cheaper load balancers, not overwhelm the Unix network buffers of a single host server, etc).

So, is it true that, with multiple data centers and HTTP traffic, the use of DNS RR is the ONLY way to assure high availability?

No it's not true, not if by 'DNS Round Robin' we simply mean handing out multiple A records for a domain. But it's true that clever use of DNS is a critical component in any global high availability system. The above illustrates one common (often best) way to go.

Edit: The Google paper "Moving Beyond End-to-End Path Information to Optimize CDN Performance" seems to me to be state-of-the-art in global load distribution for best end-user performance.

Edit 2: I read the article "Why DNS Based .. GSLB .. Doesn't Work" that OP linked to, and it is a good overview -- I recommend looking at it. Read it from the top.

In the section "The solution to the browser caching issue" it advocates DNS responses with multiple A Records pointing to multiple datacenters as the only possible solution for instantaneous fail over.

In the section "Watering it down" near the bottom, it expands on the obvious, that sending multiple A Records is uncool if they point to datacenters on multiple continents, because the client will connect at random and thus quite often get a 'slow' DC on another continent. Thus for this to work really well, multiple datacenters on each continent are needed.

This is a different solution than my steps 1 - 6. I can't provide a perfect answer on this, I think a DNS specialist from the likes of Akamai or Google is needed, because much of this boils down to practical know-how on the limitations of deployed DNS caches and browsers today. AFAIK, my steps 1-6 are what Akamai does with their DNS (can anyone confirm this?).

My feeling -- coming from having worked as a PM on mobile browser portals (cell phones) -- is that the diversity and level of total brokeness of the browsers out there is incredible. I personally would not trust a HA solution that requires the end user terminal to 'do the right thing'; thus I believe that global instantaneous fail over without breaking a session isn't feasible today.

I think my steps 1-6 above are the best that are available with commodity technology. This solution does not have instantaneous fail over.

I'd love for one of those DNS specialists from Akamai, Google etc to come around and prove me wrong. :-)

DNS Nameserver Error

It is very simple.

For the name ns2.domain.com, you have one A (or AAAA) record in your zone, and one glue record in the TLD's zone. They point to different addresses.

The glue record is set up by your registrar. The other one is set up by you in your DNS configuration. If the address you have set for the ns2 host yourself is right, let your registrar know they need to update the glue records for your domain (and give them the new IP). If it is wrong, change the A or AAAA record for your ns2 host to match what is in the glue.

You can find what is in the glue record by querying the TLD nameservers. Here is an example for com. on linux:

dig @m.gtld-servers.net in ns google.com

This will typically return an additional section with the A records for the nameservers. You could also change the query to be more direct:

dig @m.gtld-servers.net in a ns1.google.com

Best Answer

Related Solutions

DNS Round Robin – Is It the Only Way to Ensure Instant Fail-Over for Multiple Data Centers?

DNS Nameserver Error

Related Topic