Docker stats with CPU percentage more than 100

cpu-usagedockerdocker-container

I have a question about docker stats command if anyone can help me. I am new in Docker area and I want to monitor the cpu usage of a docker container.

The physical machine has 8 cores (CPU0…CPU7). I already create a container and limit its cpu resource to 1 core (CPU0) using the following command:
docker run -itd –cpuset-cpus=0 -p 8081:8080 binfalse/bives-webapp

I stress the container by sending requests from Jmeter and then monitor the cpu usage of the container via docker stats command which gives me values greater than 100%.

I don't understand why it gives more than 100% even if only one core is allocated to the container!. Do you have any idea about the reason? Does this cpu value represents the cpu usage of some system processes in addition to the container?

Thanks in advance for your help.

docker version:
Client:
Version: 17.06.0-ce
API version: 1.30
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:23:31 2017
OS/Arch: linux/amd64

Server:
Version: 17.06.0-ce
API version: 1.30 (minimum version 1.12)
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:19:04 2017
OS/Arch: linux/amd64
Experimental: true

docker info result:
Containers: 2
Running: 1
Paused: 0
Stopped: 1
Images: 10
Server Version: 17.06.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 141
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-98-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.56GiB
Name: logti048131
ID: RHOG:IR6N:FVC4:YDI5:A6T4:QA4Y:DDYF:7HZN:AI3L:WVLE:BNHY:6YNV
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Best Answer

On Linux, cgroups and Docker CPU stats deal in "time slices" of CPU, the number of nanoseconds the CPU has been in use for. To get the percentage, the container cgroup value of "time used" is compared to the overall system value for "time available" from /proc/stat.

Due to the stored "time slice" values being cumulative, the current values are compared to the previous collected values to get a more instantaneous percentage. I think this comparison is the basis of the issue.

Stats Collection

The docker stats command actually does a lot of the leg work for this information in the client. The client queries for all containers, watches events for container start/stops and opens an individual stats stream for each running container. These streams of container stats are used to calculate the percentages on each dump of stats data from a stream.

For the container stats stream, the Docker daemon collects the systems used cpu time first. It then uses libcontainer to read in a containers cgroup files and parse the text into values. Here are all the stats data structures. That is all then sent to the client as a JSON response for processing.

I believe at least part of the problem stems from reading and parsing the /proc/stat system information and container cgroup stats at different times. Every time the goroutine that reads the container info is delayed a bit, more nanoseconds are included in that sample compared to the system. As the collection process is scheduled to run every X seconds, the next read then includes less total nanoseconds so the values can bounce up on a busy system, then back down the same amount as there is not a full 'tick' included in the second sample.

The issue compounds the more containers you run and the busier the system gets. The stats gathering and forwarding to the client seems to be a relatively heavyweight process, just docker stats with a large number of containers is enough to cause more inaccuracy. My best guess is contention in the goroutines that are all trying to read the stats. I'm not sure that would account for quite the level of inaccuracy the Docker shows. I'm either completely wrong or there's something else adding to the problem.

Linux cgroups

Each Docker containers usage is tracked in a cgroup. The CPU accounting information can be viewed via the cgroup file system:

→ find /sys/fs/cgroup/cpuacct/docker -type d
/sys/fs/cgroup/cpuacct/docker
/sys/fs/cgroup/cpuacct/docker/f0478406663bb57d597d4a63a031fc2e841de279a6f02d206b27eb481913c0ec
/sys/fs/cgroup/cpuacct/docker/5ac4753f955acbdf38beccbcc273f954489b2a00049617fdb0f9da6865707717
/sys/fs/cgroup/cpuacct/docker/a4e00d69819a15602cbfb4f86028a4175e16415ab9e2e9a9989fafa35bdb2edf
/sys/fs/cgroup/cpuacct/docker/af00983b1432d9ffa6de248cf154a1f1b88e6b9bbebb7da2485d94a38f9e7e15

→ cd /sys/fs/cgroup/cpuacct/docker/f0478406663bb57d597d4a63a031fc2e841de279a6f02d206b27eb481913c0ec
→ ls -l
total 0
-rw-r--r--    1 root     root             0 Nov 20 22:31 cgroup.clone_children
-rw-r--r--    1 root     root             0 Nov 20 04:35 cgroup.procs
-r--r--r--    1 root     root             0 Nov 20 21:51 cpuacct.stat
-rw-r--r--    1 root     root             0 Nov 20 21:51 cpuacct.usage
-r--r--r--    1 root     root             0 Nov 20 22:31 cpuacct.usage_all
-r--r--r--    1 root     root             0 Nov 20 21:51 cpuacct.usage_percpu
-r--r--r--    1 root     root             0 Nov 20 22:31 cpuacct.usage_percpu_sys
-r--r--r--    1 root     root             0 Nov 20 22:31 cpuacct.usage_percpu_user
-r--r--r--    1 root     root             0 Nov 20 22:31 cpuacct.usage_sys
-r--r--r--    1 root     root             0 Nov 20 22:31 cpuacct.usage_user
-rw-r--r--    1 root     root             0 Nov 20 22:31 notify_on_release
-rw-r--r--    1 root     root             0 Nov 20 22:31 tasks

→ cat cpuacct.usage_percpu
3625488147 6265485043 6504277830 

Each value is the cumulative usage in nano seconds on that CPU.

→ grep -w ^cpu /proc/stat
cpu  475761 0 10945 582794 2772 0 159 0 0 0

Values here are USER_HZ == 1/100 of a second, so get some conversion in Docker.

Related Topic