Kubernetes trouble – /var/lib/calico/nodename: no such file or directory

calicokubernetes

I'm following guide from Linux Foundation "Kubernetes Administrator" course and stuck on deploying simple app. I think trouble is even earlier than with app deployment.

I've created master and worker, seems that they are ok:

$ kubectl get nodes
NAME                       STATUS   ROLES    AGE   VERSION
ubuntu-training-server-1   Ready    master   63m   v1.19.1
ubuntu-training-server-2   Ready    <none>   57m   v1.19.1

But here is something wrong:

$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                               READY   STATUS              RESTARTS   AGE
default       nginx-6799fc88d8-556z4                             0/1     ContainerCreating   0          50m
kube-system   calico-kube-controllers-69496d8b75-thcl8           1/1     Running             1          63m
kube-system   calico-node-gl885                                  0/1     CrashLoopBackOff    20         58m
kube-system   calico-node-jvc59                                  1/1     Running             1          63m
kube-system   coredns-f9fd979d6-hjfst                            1/1     Running             1          64m
kube-system   coredns-f9fd979d6-kvx42                            1/1     Running             1          64m
kube-system   etcd-ubuntu-training-server-1                      1/1     Running             1          64m
kube-system   kube-apiserver-ubuntu-training-server-1            1/1     Running             1          64m
kube-system   kube-controller-manager-ubuntu-training-server-1   1/1     Running             1          64m
kube-system   kube-proxy-9899t                                   1/1     Running             1          58m
kube-system   kube-proxy-z6b22                                   1/1     Running             1          64m
kube-system   kube-scheduler-ubuntu-training-server-1            1/1     Running             1          64m

I mean not all are ready.

If I try to get details about trouble node, I see:

$ kubectl logs -n kube-system calico-node-gl885
Error from server (NotFound): the server could not find the requested resource ( pods/log calico-node-gl885)

And when I try to deploy nginx I get:

$ kubectl create deployment nginx --image=nginx
deployment.apps/nginx created

and

$ kubectl get deployments
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   0/1     1            0           52m

Here are troubles too:

$ kubectl get events
...
92s         Warning   FailedCreatePodSandBox    pod/nginx-6799fc88d8-556z4      Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "4c451bc2c92f555c930f84e4e8b7082a03dd2824cf50948d348893ebea488d93" network for pod "nginx-6799fc88d8-556z4": networkPlugin cni failed to set up pod "nginx-6799fc88d8-556z4_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
...

I see no /var/lib/calico/nodename on worker node, only on master and in guide there was speaking only about kubectl apply -f calico.yaml on master.

Could anybody help me get rid of calico errors? Tried to search, have seen similar cases but looks like that they are about something different.

UPDATE

I've found possible networking conflict (Calico config contained 192.168.0.0/16 and my VirtualBox adapter was 192.168.56.0/24) so I reset cluster, changed Calico config and networking/podSubnet in kubeadm-config.yaml to 192.168.0.0/24 and init cluster again.

New status is following.

Seem OK:

$ kubectl get nodes
NAME                       STATUS   ROLES    AGE   VERSION
ubuntu-training-server-1   Ready    master   39m   v1.19.1
ubuntu-training-server-2   Ready    <none>   38m   v1.19.1

Seem OK too:

$ kubectl get events
LAST SEEN   TYPE     REASON                    OBJECT                          MESSAGE
36m         Normal   Starting                  node/ubuntu-training-server-1   Starting kubelet.
36m         Normal   NodeHasSufficientMemory   node/ubuntu-training-server-1   Node ubuntu-training-server-1 status is now: NodeHasSufficientMemory
36m         Normal   NodeHasNoDiskPressure     node/ubuntu-training-server-1   Node ubuntu-training-server-1 status is now: NodeHasNoDiskPressure
36m         Normal   NodeHasSufficientPID      node/ubuntu-training-server-1   Node ubuntu-training-server-1 status is now: NodeHasSufficientPID
36m         Normal   NodeAllocatableEnforced   node/ubuntu-training-server-1   Updated Node Allocatable limit across pods
36m         Normal   NodeReady                 node/ubuntu-training-server-1   Node ubuntu-training-server-1 status is now: NodeReady
35m         Normal   RegisteredNode            node/ubuntu-training-server-1   Node ubuntu-training-server-1 event: Registered Node ubuntu-training-server-1 in Controller
35m         Normal   Starting                  node/ubuntu-training-server-1   Starting kube-proxy.
35m         Normal   Starting                  node/ubuntu-training-server-2   Starting kubelet.
35m         Normal   NodeHasSufficientMemory   node/ubuntu-training-server-2   Node ubuntu-training-server-2 status is now: NodeHasSufficientMemory
35m         Normal   NodeHasNoDiskPressure     node/ubuntu-training-server-2   Node ubuntu-training-server-2 status is now: NodeHasNoDiskPressure
35m         Normal   NodeHasSufficientPID      node/ubuntu-training-server-2   Node ubuntu-training-server-2 status is now: NodeHasSufficientPID
35m         Normal   NodeAllocatableEnforced   node/ubuntu-training-server-2   Updated Node Allocatable limit across pods
22s         Normal   CIDRNotAvailable          node/ubuntu-training-server-2   Node ubuntu-training-server-2 status is now: CIDRNotAvailable
35m         Normal   Starting                  node/ubuntu-training-server-2   Starting kube-proxy.
35m         Normal   RegisteredNode            node/ubuntu-training-server-2   Node ubuntu-training-server-2 event: Registered Node ubuntu-training-server-2 in Controller
35m         Normal   NodeReady                 node/ubuntu-training-server-2   Node ubuntu-training-server-2 status is now: NodeReady

And here is new trouble, calico-kube-controllers-69496d8b75-gdbd7 is starting for more than half of an hour:

$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                               READY   STATUS              RESTARTS   AGE
kube-system   calico-kube-controllers-69496d8b75-gdbd7           0/1     ContainerCreating   0          37m
kube-system   calico-node-8xjsm                                  0/1     CrashLoopBackOff    13         37m
kube-system   calico-node-zktsh                                  1/1     Running             0          37m
kube-system   coredns-f9fd979d6-7bkwn                            1/1     Running             0          39m
kube-system   coredns-f9fd979d6-rsws5                            1/1     Running             0          39m
kube-system   etcd-ubuntu-training-server-1                      1/1     Running             0          39m
kube-system   kube-apiserver-ubuntu-training-server-1            1/1     Running             0          39m
kube-system   kube-controller-manager-ubuntu-training-server-1   1/1     Running             0          39m
kube-system   kube-proxy-2tvjp                                   1/1     Running             0          39m
kube-system   kube-proxy-jkzbz                                   1/1     Running             0          39m
kube-system   kube-scheduler-ubuntu-training-server-1            1/1     Running             0          39m

UPDATE 2

Details about my setup.

$ cat kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: 1.19.1
controlPlaneEndpoint: "k8smaster:6443"
networking:
  podSubnet: 192.168.0.0/24

Cluster was initialized with:

kubeadm init --config=kubeadm-config.yaml --upload-cert

Best Answer

Got it. In my setup I've got two VirtualBox VMs both with two network interfaces - one to connect with outer world (10.0.2.15) and one to talk to each other (192.168.56.104, 192.168.56.105). In kubeadm init log I've found that it was using first one, so I've explicitly told kubeadm to use internal IP. Here is command with which I've succeeded to create cluster and deploy simple app into it

kubeadm init --apiserver-advertise-address=192.168.56.104 --apiserver-cert-extra-sans=192.168.56.104 --node-name k8smaster --pod-network-cidr=192.168.0.0/24 --kubernetes-version=1.19.1

One sad thing - unfortunately I could not find how to add to config options which I've used in command line.

docker version

In my first tries, I used docker.io from the default Ubuntu repositories (17.12.1-ce). In the tutorial https://computingforgeeks.com/how-to-setup-3-node-kubernetes-cluster-on-ubuntu-18-04-with-weave-net-cni/, I discovered they recommend something different:

apt-get --purge remove docker docker-engine docker.io
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
apt-get update
apt-get install docker-ce

This is now version 18.6.1, and also doesn't cause a warning anymore in kubeadm preflight check.

cleanup

I used kubeadm reset and deleting some directories when resetting my VMs to an unconfigured state. After I read some bug reports, I decided to extend the list of directories to remove. This is what I do now:

kubeadm reset
rm -rf /var/lib/cni/ /var/lib/calico/ /var/lib/kubelet/ /var/lib/etcd/ /etc/kubernetes/ /etc/cni/
reboot

Calico setup

With the above changes, I was immediately able to init a full-working setup (all pods "Running" and curl working). I did "Variant with extra etcd".

All this worked until the first reboot, then I had again the

calico-kube-controllers-f4dcbf48b-qrqnc CreateContainerConfigError

Digging into this problem showed me.

$ kubectl -n kube-system describe pod/calico-kube-controllers-f4dcbf48b-dp6n9
Events:
  Type     Reason            Age                     From               Message
  ----     ------            ----                    ----               -------
  Warning  Failed            4m32s (x10 over 9m)     kubelet, node1     Error: Couldn't find key etcd_endpoints in ConfigMap kube-system/calico-config

Then, I realized that I did two installation instructions in chain which were meant to do only one.

kubectl apply -f https://docs.projectcalico.org/v3.2/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml

curl https://docs.projectcalico.org/v3.2/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml -O

cp -p calico.yaml calico.yaml_orig
sed -i 's/192.168.0.0/10.10.0.0/' calico.yaml

kubectl apply -f calico.yaml

Result

$ kubectl get pod,svc,nodes --all-namespaces -owide

NAMESPACE     NAME                                        READY   STATUS    RESTARTS   AGE   IP              NODE      NOMINATED NODE
default       pod/www1                                    1/1     Running   2          71m   10.10.3.4       node1     <none>
default       pod/www2                                    1/1     Running   2          71m   10.10.4.4       node2     <none>
kube-system   pod/calico-node-45sjp                       2/2     Running   4          74m   192.168.1.213   node1     <none>
kube-system   pod/calico-node-bprml                       2/2     Running   4          74m   192.168.1.211   master1   <none>
kube-system   pod/calico-node-hqdsd                       2/2     Running   4          74m   192.168.1.212   master2   <none>
kube-system   pod/calico-node-p8fgq                       2/2     Running   4          74m   192.168.1.214   node2     <none>
kube-system   pod/coredns-576cbf47c7-f2l7l                1/1     Running   2          84m   10.10.2.7       master2   <none>
kube-system   pod/coredns-576cbf47c7-frq5x                1/1     Running   2          84m   10.10.2.6       master2   <none>
kube-system   pod/etcd-master1                            1/1     Running   2          83m   192.168.1.211   master1   <none>
kube-system   pod/kube-apiserver-master1                  1/1     Running   2          83m   192.168.1.211   master1   <none>
kube-system   pod/kube-controller-manager-master1         1/1     Running   2          83m   192.168.1.211   master1   <none>
kube-system   pod/kube-proxy-9jmsk                        1/1     Running   2          80m   192.168.1.213   node1     <none>
kube-system   pod/kube-proxy-gtzvz                        1/1     Running   2          80m   192.168.1.214   node2     <none>
kube-system   pod/kube-proxy-str87                        1/1     Running   2          84m   192.168.1.211   master1   <none>
kube-system   pod/kube-proxy-tps6d                        1/1     Running   2          80m   192.168.1.212   master2   <none>
kube-system   pod/kube-scheduler-master1                  1/1     Running   2          83m   192.168.1.211   master1   <none>
kube-system   pod/kubernetes-dashboard-77fd78f978-9vdqz   1/1     Running   0          24m   10.10.3.5       node1     <none>

NAMESPACE     NAME                           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE   SELECTOR
default       service/kubernetes             ClusterIP   10.96.0.1        <none>        443/TCP          84m   <none>
default       service/www-np                 NodePort    10.107.205.119   <none>        8080:30333/TCP   71m   service=testwww
kube-system   service/calico-typha           ClusterIP   10.99.187.161    <none>        5473/TCP         74m   k8s-app=calico-typha
kube-system   service/kube-dns               ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP    84m   k8s-app=kube-dns
kube-system   service/kubernetes-dashboard   ClusterIP   10.96.168.213    <none>        443/TCP          24m   k8s-app=kubernetes-dashboard

NAMESPACE   NAME           STATUS   ROLES    AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE           KERNEL-VERSION      CONTAINER-RUNTIME
            node/master1   Ready    master   84m   v1.12.1   192.168.1.211   <none>        Ubuntu 18.04 LTS   4.15.0-20-generic   docker://18.6.1
            node/master2   Ready    <none>   80m   v1.12.1   192.168.1.212   <none>        Ubuntu 18.04 LTS   4.15.0-20-generic   docker://18.6.1
            node/node1     Ready    <none>   80m   v1.12.1   192.168.1.213   <none>        Ubuntu 18.04 LTS   4.15.0-20-generic   docker://18.6.1
            node/node2     Ready    <none>   80m   v1.12.1   192.168.1.214   <none>        Ubuntu 18.04 LTS   4.15.0-20-generic   docker://18.6.1


192.168.1.211 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
192.168.1.212 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
192.168.1.213 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
192.168.1.214 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Kubernetes Calico Networking – How to Fix ‘Reset by Peer’ and ‘BGP: Unexpected Connect’ Errors

The issue was NATing applied on VPN TUN (layer 3). Calico doesn't support it (or I'm not familiar with NATed solutions available).

Solution: use routes instead of NAT

Best Answer

Related Solutions

Node-to-Node communication doesn’t work with Kubernetes with Calico

docker version

cleanup

Calico setup

Result

Kubernetes Calico Networking – How to Fix ‘Reset by Peer’ and ‘BGP: Unexpected Connect’ Errors

Related Topic