Docker – the purpose of the VOLUME instruction in a Dockerfile

docker

In the documentation it says that the VOLUME instruction creates a mout point but I created a image using

FROM alpine
RUN mkdir /myvol
RUN echo "hello world" > /myvol/greeting

and I was able to mount /myvol or any any other path on the containers filesytem using docker run -v vol:/myvol myimage and was able to see the data there in /var/lib/docker/vol/_data on the host machine.

What difference would adding VOLUME myvol to the Dockerfile make?

Best Answer

I've been struggling with understanding this quite a bit and had to do some actual testing as documentation was a bit to vague to me.

With VOLUME directive in the Dockerfile you explicitly declare a volume that container created from that image exposes even if it is not explicitly mounted when container is created at container creation time - e.g. docker run -v <volume>:/data <image name>.

Instead I can have a directive in the Dockerfile

FROM alpine

RUN mkdir /data && echo "Some data" > /data/mydata
VOLUME /data

Start the container from image built with above Dockerfile:

docker run -ti --rm --name volume-test voltest

Inspect the running container

docker container inspect volume-test

...
        "Mounts": [
            {
                "Type": "volume",
                "Name": "c4d070456dfa65540bd5c75b958930837bbf4277f4a4169b791679127f29a73a",
                "Source": "/var/snap/docker/common/var-lib-docker/volumes/c4d070456dfa65540bd5c75b958930837bbf4277f4a4169b791679127f29a73a/_data",
                "Destination": "/data",
                "Driver": "local",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
        ]
...

As you can see there is a volume mounted to /data directory of container. This anonymous volume was automatically created during container creation because of VOLUME directive in the Dockerfile and since container was started with --rm option it will be automatically removed when container is stopped (assuming nothing else will use it at that time). You can confirm this by using docker volume ls after stopping the container.

This allows usage of such ad-hoc volumes from other containers, for example mounting them by running:

docker run --rm -ti --name alpine-vol --volumes-from volume-test alpine sh

Check the /data directoy in the newly started container, it will contain original container's data written on the volume.

I definitely see use of this when data needs to be shared between containers but does not need to persist after original container has been removed (e.g. as part of sidecar pattern). If data persistence is required you can still explicitly mount a volume into the same directory.

Related Solutions

Docker – How to Backup and Restore Docker Volumes

You're right. Since you can have multiple containers with volumes on their own, you need to keep track which volume corresponds to which container. How to do that depends on your setup: I use the name -data for the data container, so it's obvious to which container a image belongs. That way it can be backed up like this:

VOLUME=`docker inspect $NAME-data | jq '.[0].Volumes["/path/in/container"]'`
tar -C $VOLUME . -czvf $NAME.tar.gz

Now you just need to rebuild your image and recreate your data container:

cat $NAME.tar.gz | docker run -name $NAME-data -v /path/in/container \
                              -i busybox tar -C /path/int/container -xzf -

So this means you need to backup:

Dockerfile
volume
volume path in container
name of the container the volume belongs to

Update: In the meanwhile I created a tool to backup containers and their volume(s) (container(s)): https://github.com/discordianfish/docker-backup and a backup image that can create backups and push them to s3: https://github.com/discordianfish/docker-lloyd

Docker data volumes with couchbase

You should consider preserving the entire /opt/couchbase/var directory in a volume, rather than just the /opt/couchbase/var/lib/couchbase/data subdirectory.

The reason is that there is "cluster state" stored in /opt/couchbase/var. If that's lost, it will think it's a brand new cluster on a new container instance.

I wrote up a blog post which walks through a complete example of spinning up couchbase server under docker. The Dockerfile and all scripts used are on github.

Best Answer

Related Solutions

Docker – How to Backup and Restore Docker Volumes

Docker data volumes with couchbase

Related Topic