I'm running a single master/node Kubernetes cluster in a CentOS 7 virtual machine, and I realized that the systemd
process (as PID 1) is constantly making using of CPU.
[root@ip-10-0-0-66 ~]# ps aux | head -n2
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 7.7 0.0 129088 7720 ? Ss Jun21 639:41 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
Furthermore, systemd
is generating millions of log lines like this:
[root@ip-10-0-0-66 ~]# tail -n 10 /var/log/messages
Jun 27 12:49:14 ip-10-0-0-66 systemd: Created slice libcontainer_6148_systemd_test_default.slice.
Jun 27 12:49:14 ip-10-0-0-66 systemd: Removed slice libcontainer_6148_systemd_test_default.slice.
Jun 27 12:49:15 ip-10-0-0-66 systemd: Created slice libcontainer_6155_systemd_test_default.slice.
Jun 27 12:49:15 ip-10-0-0-66 systemd: Removed slice libcontainer_6155_systemd_test_default.slice.
Jun 27 12:49:15 ip-10-0-0-66 systemd: Created slice libcontainer_6155_systemd_test_default.slice.
Jun 27 12:49:15 ip-10-0-0-66 systemd: Removed slice libcontainer_6155_systemd_test_default.slice.
Jun 27 12:49:15 ip-10-0-0-66 systemd: Created slice libcontainer_6162_systemd_test_default.slice.
Jun 27 12:49:15 ip-10-0-0-66 systemd: Removed slice libcontainer_6162_systemd_test_default.slice.
Jun 27 12:49:15 ip-10-0-0-66 systemd: Created slice libcontainer_6162_systemd_test_default.slice.
Jun 27 12:49:15 ip-10-0-0-66 systemd: Removed slice libcontainer_6162_systemd_test_default.slice.
There are almost 50 log lines registered per second, which are flooding the /var/logs/messages
file:
[root@ip-10-0-0-66 ~]# sudo wc -l /var/log/messages
5992826 /var/log/messages
[root@ip-10-0-0-66 ~]# sudo cat /var/log/messages | grep 'systemd_test_default' | wc -l
5987033
Eventually, the kubelet
process also register errors like this one:
Jun 27 12:53:37 ip-10-0-0-66 systemd: Removed slice libcontainer_29206_systemd_test_default.slice.
Jun 27 12:53:37 ip-10-0-0-66 systemd: Created slice libcontainer_29206_systemd_test_default.slice.
Jun 27 12:53:37 ip-10-0-0-66 systemd: Removed slice libcontainer_29206_systemd_test_default.slice.
Jun 27 12:53:37 ip-10-0-0-66 kubelet: W0627 12:53:37.447052 5352 watcher.go:87] Error while processing event ("/sys/fs/cgroup/memory/libcontainer_29206_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): readdirent: no such file or directory
Jun 27 12:53:37 ip-10-0-0-66 kubelet: W0627 12:53:37.447117 5352 watcher.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_29206_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): open /sys/fs/cgroup/devices/libcontainer_29206_systemd_test_default.slice: no such file or directory
Jun 27 12:53:37 ip-10-0-0-66 systemd: Created slice libcontainer_29225_systemd_test_default.slice.
Jun 27 12:53:37 ip-10-0-0-66 systemd: Removed slice libcontainer_29225_systemd_test_default.slice.
Jun 27 12:53:37 ip-10-0-0-66 systemd: Created slice libcontainer_29232_systemd_test_default.slice.
Jun 27 12:53:37 ip-10-0-0-66 systemd: Removed slice libcontainer_29232_systemd_test_default.slice.
How can I figure out what is causing this systemd
CPU usage and log messages?
Additional info
Versions:
- OS: CentOS Linux release 7.6.1810 (Core)
- Kernel: 3.10.0-957.21.2.el7.x86_64
- Kubernetes: v1.15.0
- Docker: 18.09.5 CE
- systemd: 219
Cgroup driver config:
$ docker info | grep -i cgroup
Cgroup Driver: systemd
$ cat /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--cgroup-driver=systemd"
Best Answer
This is a known bug currently in kubernetes's
kubelet
process and isn't limited to only CentOS based systems, but any Linux (including Ubuntu), where you're usingsystemd
as the cgroup-driver for kubelet. It seems to started showing up in the wild after version 1.14, but may not simply not have been as prevalent an issue before 1.14 as that's around when the official recommendation from the kubernetes docs recommends using systemd as the cgroup driver), for the following reason:source: https://kubernetes.io/docs/setup/cri/
Prior to then the other cgroup-driver
cgroupfs
seems to have been the accepted/default approach. In fact, I only switched over because the recommendation started to appear during kubeadm init'ing a new 1.14.x cluster a few months back, which lead me to find this github issue on exactly this situation.Basically, it seems to be an unreliable interaction between kubelet handling of systemd and cAdvisor probing, so perhaps the code is not quite ready for prime-time yet. The full gorey technical explaination is available in this comment:
The issue has been open since April without much signs of it being tackled (as it seems low priority).
The last commit to have touched this code is this one, however that looks like mainly what it changed was the file naming/directory structure and layout of the code, and the cadvisor code was introduced a long time before.
Finally, while switching to use cgroupfs is an option (as @hanx commented), it may lead to WORSE problems (instability under load) and again is NOT recommended by the official docs. Some people opt for that route still, just to get rid of the (mostly benign) error messages.