Docker – How to remove old and unused Docker images

dockerdocker-image

When running Docker for a long time, there are a lot of images in system. How can I remove all unused Docker images at once safety to free up the storage?

In addition, I also want to remove images pulled months ago, which have the correct TAG.

So, I'm not asking for removing untagged images only. I'm searching for a way to remove general unused images, which includes both untagged and other images such as pulled months ago with correct TAG.

Best Answer

Update Sept. 2016: Docker 1.13: PR 26108 and commit 86de7c0 introduce a few new commands to help facilitate visualizing how much space the docker daemon data is taking on disk and allowing for easily cleaning up "unneeded" excess.

docker system prune will delete ALL dangling data (i.e. In order: containers stopped, volumes without containers and images with no containers). Even unused data, with -a option.

You also have:

For unused images, use docker image prune -a (for removing dangling and ununsed images).
Warning: 'unused' means "images not referenced by any container": be careful before using -a.

As illustrated in A L's answer, docker system prune --all will remove all unused images not just dangling ones... which can be a bit too much.

Combining docker xxx prune with the --filter option can be a great way to limit the pruning (docker SDK API 1.28 minimum, so docker 17.04+)

The currently supported filters are:

until (<timestamp>) - only remove containers, images, and networks created before given timestamp
label (label=<key>, label=<key>=<value>, label!=<key>, or label!=<key>=<value>) - only remove containers, images, networks, and volumes with (or without, in case label!=... is used) the specified labels.

See "Prune images" for an example.

Original answer (Sep. 2016)

I usually do:

docker rmi $(docker images --filter "dangling=true" -q --no-trunc)

I have an [alias for removing those dangling images: drmi]13

The dangling=true filter finds unused images

That way, any intermediate image no longer referenced by a labelled image is removed.

I do the same first for exited processes (containers)

alias drmae='docker rm $(docker ps -qa --no-trunc --filter "status=exited")'

As haridsv points out in the comments:

Technically, you should first clean up containers before cleaning up images, as this will catch more dangling images and less errors.

Jess Frazelle (jfrazelle) has the bashrc function:

dcleanup(){
    docker rm -v $(docker ps --filter status=exited -q 2>/dev/null) 2>/dev/null
    docker rmi $(docker images --filter dangling=true -q 2>/dev/null) 2>/dev/null
}

To remove old images, and not just "unreferenced-dangling" images, you can consider docker-gc:

A simple Docker container and image garbage collection script.

Containers that exited more than an hour ago are removed.

Images that don't belong to any remaining container after that are removed.

Related Solutions

Docker – How is Docker different from a virtual machine

Docker originally used LinuX Containers (LXC), but later switched to runC (formerly known as libcontainer), which runs in the same operating system as its host. This allows it to share a lot of the host operating system resources. Also, it uses a layered filesystem (AuFS) and manages networking.

AuFS is a layered file system, so you can have a read only part and a write part which are merged together. One could have the common parts of the operating system as read only (and shared amongst all of your containers) and then give each container its own mount for writing.

So, let's say you have a 1 GB container image; if you wanted to use a full VM, you would need to have 1 GB x number of VMs you want. With Docker and AuFS you can share the bulk of the 1 GB between all the containers and if you have 1000 containers you still might only have a little over 1 GB of space for the containers OS (assuming they are all running the same OS image).

A full virtualized system gets its own set of resources allocated to it, and does minimal sharing. You get more isolation, but it is much heavier (requires more resources). With Docker you get less isolation, but the containers are lightweight (require fewer resources). So you could easily run thousands of containers on a host, and it won't even blink. Try doing that with Xen, and unless you have a really big host, I don't think it is possible.

A full virtualized system usually takes minutes to start, whereas Docker/LXC/runC containers take seconds, and often even less than a second.

There are pros and cons for each type of virtualized system. If you want full isolation with guaranteed resources, a full VM is the way to go. If you just want to isolate processes from each other and want to run a ton of them on a reasonably sized host, then Docker/LXC/runC seems to be the way to go.

For more information, check out this set of blog posts which do a good job of explaining how LXC works.

Why is deploying software to a docker image (if that's the right term) easier than simply deploying to a consistent production environment?

Deploying a consistent production environment is easier said than done. Even if you use tools like Chef and Puppet, there are always OS updates and other things that change between hosts and environments.

Docker gives you the ability to snapshot the OS into a shared image, and makes it easy to deploy on other Docker hosts. Locally, dev, qa, prod, etc.: all the same image. Sure you can do this with other tools, but not nearly as easily or fast.

This is great for testing; let's say you have thousands of tests that need to connect to a database, and each test needs a pristine copy of the database and will make changes to the data. The classic approach to this is to reset the database after every test either with custom code or with tools like Flyway - this can be very time-consuming and means that tests must be run serially. However, with Docker you could create an image of your database and run up one instance per test, and then run all the tests in parallel since you know they will all be running against the same snapshot of the database. Since the tests are running in parallel and in Docker containers they could run all on the same box at the same time and should finish much faster. Try doing that with a full VM.

From comments...

Interesting! I suppose I'm still confused by the notion of "snapshot[ting] the OS". How does one do that without, well, making an image of the OS?

Well, let's see if I can explain. You start with a base image, and then make your changes, and commit those changes using docker, and it creates an image. This image contains only the differences from the base. When you want to run your image, you also need the base, and it layers your image on top of the base using a layered file system: as mentioned above, Docker uses AuFS. AuFS merges the different layers together and you get what you want; you just need to run it. You can keep adding more and more images (layers) and it will continue to only save the diffs. Since Docker typically builds on top of ready-made images from a registry, you rarely have to "snapshot" the whole OS yourself.

Docker – How to get a Docker container’s IP address from the host

The --format option of inspect comes to the rescue.

Modern Docker client syntax is:

docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' container_name_or_id

Old Docker client syntax is:

docker inspect --format '{{ .NetworkSettings.IPAddress }}' container_name_or_id

These commands will return the Docker container's IP address.

As mentioned in the comments: if you are on Windows, use double quotes " instead of single quotes ' around the curly braces.

Best Answer

Original answer (Sep. 2016)

Related Solutions

Docker – How is Docker different from a virtual machine

Docker – How to get a Docker container’s IP address from the host

Related Topic