Docker – we (not) load balance across multiple hosts in a Docker Swarm cluster

clusterdockerdocker-swarmload balancing

We got 3 hosts in a swarm cluster and we've got a web application deployed in this cluster. The web application will run on only 1 host at any given time. If one host dies, the web application will be moved to another host.

Docker will take care of routing your request to the web application regardless of which host your request hit.

To ensure that we will always reach a host which is up and running, we though of using Nginx is front. We'd create a virtual host on the Nginx which would proxy requests to the Docker swarm.

We have two approaches for this.

A. We would simply "round robin" requests across the hosts.

This is a simply approach. We would use Nginx to take out services when they fail. However, even though Nginx got a 500 error from host 1 it may be that the web application returned that 500 error from host 3. Nginx would incorrectly think that the service fails on host 1 and take the service on that host out.

B. We would direct all requests to the swarm leader.

We do not use Nginx to load balance across the hosts. We simply send all requests to the Docker Swarm leader (through various scripts we configure Nginx to do this). This way, we are not doing "double" load balancing (both Nginx and Docker Swarm) – but all traffic goes through the Docker Swarm leader only.

On the one hand, solution A is simply and easy to understand, but it may also add complexity in terms of double load balancing. The second solution may be more convoluted in the sense that it is less standard but it may also keep things easier to understand.

Which approach should we – from a pure technical perspective – prefer?

Best Answer

I think this a comment on option 1.

I'm looking at this also and from what I can tell in Docker Swarm > 1.12 all you have to do is create a reverse proxy to the service/website. What I did in my proof of concept is, 1) create two overlay networks, one for a proxy network which will be external facing and one for my service network which is internal only then 2) deployed 3 replicas of the service attaching it to the service overlay and the proxy overlay network (--network proxy --network my-service). Finally, I stood up an nginx reverse proxy that is bound to my corporate alias which will accept incoming connections on port 80 and 443.

sudo docker service create --name proxy \ 
-p 80:80 \
-p 443:443 \
--network proxy \
--mount type=volume,source=reverse-proxy,target=/etc/nginx/conf.d,volume-driver=azurefile \
--mount type=volume,source=ssl,target=/etc/nginx/ssl,volume-driver=azurefile \
nginx

The mounts are simply there so I don't have to copy files to each host. This mount is pointing to an Azure File Storage account which makes spinning up more containers much more maintainable.

The reverse proxy looks like this:

location /my-service/ {
    proxy_read_timeout 900;                
    proxy_pass http://my-service/;
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
}

http://my-service is an internal to swarm DNS record that always points to a VIP where my service is hosted whether that be 1 node or 1,000 nodes. From what I can tell, every service you deploy gets it's own subnet range and the master nodes keep track of where the containers are running and handle the routing for you.

Currently we have multiple A-records pointing at the different host IPs and that uses round robin however we might move this from Windows Server managed DNS to Azure Traffic Manager for more control.