Docker stack deploy without downtime

deploymentdockerstack

I am following the Docker tutorial and do it in my own version

version: "3.1"
services:
    web:
        image: registry.gitlab.com/xxxx/xxxx:latest
        deploy:
             replicas: 2
        ports:
             - "8888:80"
    mysql:
        image: mysql:latest
        environment:
           MYSQL_ROOT_PASSWORD: password
           MYSQL_USER: user
           MYSQL_PASS: password
        ports:
             - "8889:3306"
        volumes: 
           - mysql-data:/var/lib/mysql
volumes:
   mysql-data:

Whenever I change some code, I rebuild new docker image and run update.

docker stack deploy --compose-file docker-compose.yml xxxx-learn

Then I noticed some downtime. They will start new container one at the time and stop old container one at the time. Then problem is that it takes a few minutes to download new image and it takes time for web server to run.

One solution that I was thinking of is to run Nginx load balancing in front of those two web server replicas. But is there any better solution?

Best Answer

You should put restart policy and stop_grace_period on your compose file:

version: "3.1"
services:
    web:
        stop_grace_period: 10s
        deploy:
             replicas: 2
             restart_policy:
               condition: on-failure

First let's recall a few definitions

Docker images

Docker images are essentially a union filesystem + metadata. You can inspect the content of docker image union filesystem with the docker export command, and you can inspect a docker image metadata with the docker inspect command.

Data volumes

from the Docker user guide:

A data volume is a specially-designated directory within one or more containers that bypasses the Union File System to provide several useful features for persistent or shared data.

It is important to note here that a given volume (as the directory or file that contains data) is reusable only if it exists at least one docker container using it. Docker images don't have volumes, they only have metadata which eventually tells where volumes would be mounted on the union filesystem. Data volumes aren't either part of docker containers union filesystem, so where are they? under /var/lib/docker/volumes on the docker host (while containers are stored under /var/lib/docker/containers).

Data volume containers

That special type of container has nothing special. They are just stopped containers using a data volume with the sole and unique goal of having at least one container using that data volume. Remember, as soon as the last container (running or stopped) using a given data volume is deleted, that volume will become unreachable through the docker run --volumes-from option.

Working with data volume containers

How to create a data volume container

The image used to create a data volume container has no importance as such a container can remain stopped and still fill its purpose. So to create a data container named datatest_data for a volume in /datafolder you only need to run:

docker run --name datatest_data --volume /datafolder busybox true

Here base is the image name (a conveniently small one) and true is a command we provide just to avoid seeing the docker daemon complain about a missing command. Anyway after you have a stopped container named datatest_data with the sole purpose of allowing you to reach that volume with the --volumes-from option of the docker run command.

How to read from a data volume container

I know two ways of reading a data volume: the first one is through a container. If you cannot have a shell into an existing container to access that data volume, you can run a new container with the --volumes-from option for the sole purpose of reading that data.

For instance:

docker run --rm --volumes-from datatest_data busybox cat /datafolder/data.txt

The other way is to copy the volume from the /var/lib/docker/volumes folder. You can discover the name of the volume in that folder by inspecting the metadata of one of the container using the volume. See this answer for details.

Working with volumes (since Docker 1.9.0)

How to create a volume (since Docker 1.9.0)

Docker 1.9.0 introduced a new command docker volume which allows to create volumes :

docker volume create --name hello

How to read from a volume (since Docker 1.9.0)

Let say you created a volume named hello with docker volume create --name hello, you can mount it in a container with the -v option :

docker run -v hello:/data busybox ls /data

About committing & pushing containers

It should now be clear that since data volumes aren't part of a container (the union filesystem), committing a container to produce a new docker image won't persist any data that would be in a data volume.

Making backups of data volumes

The docker user guide has a nice article about making backups of data volumes.

Good article reagarding volumes: http://container42.com/2014/11/03/docker-indepth-volumes/

Docker – Update Docker container without downtime

The ideal target scenario

Yes, you should use a load balancer and update one instance at a time. I'm not sure where inter-container communication comes in.

As an example, imagine you have a load balancer which serves your site A. Users only connect to it as and only know it as "A". The load balancer knows that there are two or more backends (B, C, etc.), and whether they're VMs or containers doesn't matter.

Then, you want to upgrade the backends, which in this case are Apache instances.

take B out of the eligible backends for the load balancer so it's no longer accepting any traffic.
wait for the currently-live requests to be served and existing connections closed.
update the container or underlying VM that serves B
restart B, wait for it to load and start working
test B to make sure it's serving new requests properly
add B back to the load balancer backend pool to re-enable traffic

Then, do the same process for C, D, etc.

Note that there's an open request for in-place upgrades of Docker containers, from Nov 2013, but it doesn't appear to have much progress so the above solution is what you should do in the mean time.

What to do for an existing live site

Presumably, you're asking this because you're already running a live site in this model and you would like to upgrade it without downtime. So, we need to get to the ideal target state above, but incrementally.

Let's assume that:

you have a DNS name pointing to your container
your container runs on some IP address
your users don't know the container's IP address and it's not hard-coded anywhere

If these assumptions are false, you should first fix it such that this is correct.

Then, follow these steps:

create a load balancer at a new IP and point it at the existing container as its only backend
change DNS to point to the load balancer rather than the container IP directly
add an identical Apache backend with the same VM + container setup
now you have a load balancer with two backends B and C, so follow the directions in the "ideal target scenario" section for upgrading them one at-a-time

How to update a load balancer

The easy (hosted) way

The easiest option is to not run your own balancer. For example, if you're using a cloud platform which provides load balancing as a service, consider using it and then maintenance and update of the load balancer is not an issue.

The manual way

If you are running your own load balancer, adding another layer of indirection (i.e., DNS) will help. Let's assume the following:

that we have a host name resolving to the IP of our load balancer A which we would like to update
our load balancer has a backend pool of P1, P2, etc.

We proceed as follows:

create a new load balancer B with the new software version
add all backend pool instances P1, P2, etc. to our new load balancer B as backends
add B's IP address to the DNS resolution along with A
- now we're effectively using DNS as a load balancer
- if the entries for A and B are unweighted, they're effectively 50-50
- now watch to see how B performs, whether there are any errors, etc.
- if anything is wrong with B, undo as follows:
  1. remove B from the DNS config
  2. wait for the the B entry in the DNS to disappear (i.e., wait for TTL to expire)
  3. turn down B
assume you've done the "burn-in" test for B and everything is fine
update the priority and weight for B in DNS gradually
remove A from DNS entirely
wait for DNS TTL to expire; A should not be getting any requests anymore
turn down A

and you're done.

Details, diagrams, and tooling

See these write-ups and tools that can help you automate the process, but the general idea is the same:

The Moral

"All problems in computer science can be solved by another level of indirection, except of course for the problem of too many indirections." — David Wheeler