Docker – we (not) load balance across multiple hosts in a Docker Swarm cluster

clusterdockerdocker-swarmload balancing

We got 3 hosts in a swarm cluster and we've got a web application deployed in this cluster. The web application will run on only 1 host at any given time. If one host dies, the web application will be moved to another host.

Docker will take care of routing your request to the web application regardless of which host your request hit.

To ensure that we will always reach a host which is up and running, we though of using Nginx is front. We'd create a virtual host on the Nginx which would proxy requests to the Docker swarm.

We have two approaches for this.

A. We would simply "round robin" requests across the hosts.

This is a simply approach. We would use Nginx to take out services when they fail. However, even though Nginx got a 500 error from host 1 it may be that the web application returned that 500 error from host 3. Nginx would incorrectly think that the service fails on host 1 and take the service on that host out.

B. We would direct all requests to the swarm leader.

We do not use Nginx to load balance across the hosts. We simply send all requests to the Docker Swarm leader (through various scripts we configure Nginx to do this). This way, we are not doing "double" load balancing (both Nginx and Docker Swarm) – but all traffic goes through the Docker Swarm leader only.

On the one hand, solution A is simply and easy to understand, but it may also add complexity in terms of double load balancing. The second solution may be more convoluted in the sense that it is less standard but it may also keep things easier to understand.

Which approach should we – from a pure technical perspective – prefer?

Best Answer

I think this a comment on option 1.

I'm looking at this also and from what I can tell in Docker Swarm > 1.12 all you have to do is create a reverse proxy to the service/website. What I did in my proof of concept is, 1) create two overlay networks, one for a proxy network which will be external facing and one for my service network which is internal only then 2) deployed 3 replicas of the service attaching it to the service overlay and the proxy overlay network (--network proxy --network my-service). Finally, I stood up an nginx reverse proxy that is bound to my corporate alias which will accept incoming connections on port 80 and 443.

sudo docker service create --name proxy \ 
-p 80:80 \
-p 443:443 \
--network proxy \
--mount type=volume,source=reverse-proxy,target=/etc/nginx/conf.d,volume-driver=azurefile \
--mount type=volume,source=ssl,target=/etc/nginx/ssl,volume-driver=azurefile \
nginx

The mounts are simply there so I don't have to copy files to each host. This mount is pointing to an Azure File Storage account which makes spinning up more containers much more maintainable.

The reverse proxy looks like this:

location /my-service/ {
    proxy_read_timeout 900;                
    proxy_pass http://my-service/;
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
}

http://my-service is an internal to swarm DNS record that always points to a VIP where my service is hosted whether that be 1 node or 1,000 nodes. From what I can tell, every service you deploy gets it's own subnet range and the master nodes keep track of where the containers are running and handle the routing for you.

Currently we have multiple A-records pointing at the different host IPs and that uses round robin however we might move this from Windows Server managed DNS to Azure Traffic Manager for more control.

Related Solutions

Cannot add second host to a Network Load Balance cluster

Turns out that the problem is simply UAC (User Account Control). The problem is not experienced if you:

Use the Administrator account
Turn off UAC and use an account in the Administrator group.

(I'm surprised that this is not well known, but I would guess that the Administrator account is used to configure NLB, so the situation rarely occurs)

Docker – In a container cluster like Kubernetes or Docker 1.12 Swarm, how do you resolve with external DNS to the good container on the good host

Docker 1.12 with swarm-mode comes with built-in load balancing.

In your case, the primary benefit of that is that you don't have to dynamically update your DNS depending on what host a particular container ends up running on (which would most likely lead to disaster anyway, because of DNS ttl and caching).

Say for example you ran:

docker service create --name nginx -p 80:80 nginx

That will create a nginx service with one replica, so a single container will be started on a random host in your swarm. But the built-in load balancing will route requests on port 80 on any of the swarm hosts to the host where the container is run. And similarly if you scale the service with:

docker service scale nginx=2

That makes for a very simple setup:

Use the DNSMadeEasy round-robin feature and add A records for all the host IPs in your swarm.
Deploy your app with docker service create
Profit

From your question, it sounds like you'd like to run multiple sites on your swarm, eg. multiple apps all listening on port 80 and port 443. As @Flippy points out, to do that you currently have to run a layer 7 loadbalancer like HAProxy or nginx that inspects request host-headers and forwards requests appropriately. This is also simple to do with Docker 1.12 Swarm mode and this is a good getting started guide.

Best Answer

Related Solutions

Cannot add second host to a Network Load Balance cluster

Docker – In a container cluster like Kubernetes or Docker 1.12 Swarm, how do you resolve with external DNS to the good container on the good host

Related Topic