Kubernetes Master Node – Fixing Fluctuating Kubelet Service

amazon-web-servicesdockerkubeadmkubernetesUbuntu

I was trying to create a Kubernetes Cluster using kubeadm. I had spin up an Ubuntu 18.04 server, installed docker (made it sure that docker.service was running), installed kubeadm kubelet and kubectl.

The following are the steps that I did:

sudo apt-get update
sudo apt-get install docker.io -y
sudo systemctl enable docker
sudo systemctl start docker
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add
sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"
sudo apt-get install kubeadm kubelet kubectl -y
sudo apt-mark hold kubeadm kubelet kubectl 
kubeadm version
swapoff –a

sudo hostnamectl set-hostname master-node
sudo kubeadm init --pod-network-cidr=10.244.0.0/16

Once after I ran sudo kubeadm init –pod-network-cidr=10.244.0.0/16, I got the following error:

root@ip-172-31-10-50:/home/ubuntu# sudo kubeadm init –pod-network-cidr=192.168.0.0/16
[init] Using Kubernetes version: v1.23.1
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] Using existing etcd/ca certificate authority
[certs] Using existing etcd/server certificate and key on disk
[certs] Using existing etcd/peer certificate and key on disk
[certs] Using existing etcd/healthcheck-client certificate and key on disk
[certs] Using existing apiserver-etcd-client certificate and key on disk
[certs] Using the existing "sa" key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/scheduler.conf"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-> flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.

I tried to run kubectl init –pod-network-cidr using Flannel's CIDR(10.244.0.0/16) and also Calico's CIDR(192.168.0.0/16). However, I got the same error.

Also, I observed that the status of Kubelet in my EC2 instance was fluctuating. When I ran systemctl status kubelet.service, sometimes it was not running and sometimes the Kubelet was running. It happens automatically. Think this is what fails the kubectl init since the kubelet-check clearly says: "It seems like the kubelet isn't running or healthy"

After running systemctl status kubelet.service, the error:

root@ip-172-31-10-50:/home/ubuntu# systemctl status kubelet.service
● kubelet.service – kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Wed 2021-12-29 17:52:35 UTC; 3s ago
Docs: https://kubernetes.io/docs/home/
Process: 22901 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 22901 (code=exited, status=1/FAILURE)

And when I keep on running systemctl status kubelet.service, after a few seconds, the kubectl.service seems to be running and after a few seconds, it fails again.

…skipping…
● kubelet.service – kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Thu 2021-12-30 18:50:49 UTC; 125ms ago
Docs: https://kubernetes.io/docs/home/
Main PID: 12895 (kubelet)
Tasks: 9 (limit: 4686)
CGroup: /system.slice/kubelet.service
└─12895 /usr/bin/kubelet –bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf > –kubeconfig=/etc/kubernetes/kubelet.conf –config=/var/lib/kubelet/conf

I am not sure why kubelet is fluctuating this way.
Does anyone know how to fix this?

Best Answer

The error log has your answer:

failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs""

See Kubernetes documentation on how to configure acgroup driver

Related Topic