Reverse Proxy – How to Setup Traefik for High Availability

high-availabilityreverse-proxy

I am trying to setup Traefik on a production site, and I'm struggling with some high availability issues. I think we still need a reverse-proxy in front of the Traefik cluster. Here are the potential setups that I've considered, and why the reverse-proxy seems to be needed:

  1. Setup DNS A records to point to each of the Traefik nodes for load balancing and failover.

    • This practice is discouraged according to multiple sites including this SO question and this SF question.

    • Even using a service like DNSMadeEasy seems to be discouraged due to DNS caching and TTL issues.

  2. Point one DNS record to one of the nodes running Traefik.

    • That node becomes a SPOF. My nodes are running on CoreOS, which reboots after every update, so we would be guaranteed to have a few minutes of downtime each week.

    • We could move the DNS record to an alternate node whenever downtime is expected. This would be a pain to manage manually. I can envision a solution paired with locksmithd that handles this automatically, but I don't really want to build it and it wouldn't handle unexpected downtime.

    • Part of the rationale for using Docker Swarm (or Kubernetes) is to make nodes interchangeable.

  3. Put a load-balancer/reverse-proxy in front of the Traefik cluster. The reverse-proxy can provide failover between all the Traefik nodes, and DNS would point to the reverse-proxy.

    • Yes, this is a SPOF, but in my experience, it is pretty easy to get good uptime with this setup. If occasional maintenance is required, the DNS record can be pointed to a new proxy.

Am I missing something or over thinking this?

Best Answer

there are different kind of solutions.

1) Build you own HA Loadbalancer in front of your Swarm/Kubernetes Cluster to distribute the traffic and perform failover.

There lot of different Appliances out there:

  1. Netscaler
  2. Kemp
  3. F5

While this approach is HA it is usually not cheap.

An cheaper alternative to this could be a Nginx/Haproxy + Keepalived Setup.

However you need of course a floating IP and have to take care of the arp caches.

2) Take use of a "Cloud Loadbalancer". Digital Ocean, AWS, GKE, Openstack all provide such an Feature. Its easier to setup (most of the time) however if it is cheaper you have to calculate.

On DigitalOcean the LB is just 20$ and there is an Beta with a managed Kubernetes Cluster. You may want to have a look into it. All components plug well together https://www.digitalocean.com/products/kubernetes/

3) If you Apps are not 100% critical I can suggest an special solution I've used so far:

Cloudflare + low TTL + https://github.com/Berndinox/cloudflare-ddns

It works that simple: https://github.com/Berndinox/compose-v3-collection/blob/master/wordpress/www.yml How: It spins up WordPress and all its requirements including the DNS Container. The DNS Container is Updating the DNS Record of the Domain on Cloudflare (Depends on which host the container starts, the IP is different). Good, if one Host is rebooted or the container healthcheck fails the container is rescheduled. When being rescheduled and the Host initially taken is offline, the container will start on another host and is then pushing the new IP into Cloudflare. That all happens automatically without doing anything. :)

The Cloudflare TTLs are really low, so there may be just a few seconds of downtime.

Related Topic