Web-server – Eliminate single point of failure for webservers

failoverredundancyweb-server

I know in DNS, that each of the DNS servers will be tried to see if they will respond

I know in email that in the event of a failure it will go to the next one in the list or it will hold the mail for a period of time

As far as I know, in webservers, the browser will get one of the webserver IP addresses and try it and if it fails it will give up. Is this correct? If so, then the only way to direct traffic away from a failed IP address would be with the DNS servers and even that would not update immediately.

Best Answer

If you want no single points of failure at all, you need to do global server load balancing -- you obviously can't rely on a single datacentre, and even with a redundant BGP configuration, your BGP tables constitute a single point of failure that can be messed up if someone pushes a bad config.

What you do is configure DNS to advertise multiple IP addresses for the A record for your domain name, pointing at copies of your site that are in different datacentres (preferably in different cities), and the browser will pick one (usually at random, but watch out for Windows Vista which implements the stupid bits of RFC3484 and is thus not random), and will store the others. Depending on the browser, it will generally use one of the other addresses if the one it's using becomes unavailable. Your DNS servers have to continually monitor all of the sites and stop advertising any that go down. They also need very short TTLs. There are hardware solutions for this -- e.g. F5's BigIP devices.

You'll also need ways to replicate your database, your files, and your users' session states between the datacentres in realtime.

You'll obviously also need to get network diagrams and supplier lists from all of your ISPs to make sure that all your network routes are fully geographically diverse and the ISPs don't rely on the same upstream supplier. It's probably worth making sure they're not on the same power grid, as well.

See here for more information on global server load balancing (although it's a bit old and out-of-date): http://www.tenereillo.com/GSLBPageOfShame.htm

Your failover won't be quite as fast as BGP failover, but you won't be able to bring your site entirely down with a single bad BGP config. You may mess up the configuration of a single DNS server or datacentre, but that won't bring you down completely (unless you push your DNS updates automatically to all of your DNS servers).

Related Topic