Kubernetes – Troubleshooting Network Issues in Cluster

kubernetes

I have a Kubernetes cluster of one master and three nodes inside a VPN, it shows ready status. It was built using kubeadm and flannel. The VPN network has the range 192.168.1.0/16.

$ kubectl get nodes -o wide

NAME        STATUS   ROLES    AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
k8-master   Ready    master   144d   v1.17.0   192.168.1.132   <none>        Ubuntu 18.04.3 LTS   4.15.0-72-generic   docker://18.9.7
k8-n1       Ready    <none>   144d   v1.17.0   192.168.1.133   <none>        Ubuntu 18.04.3 LTS   4.15.0-72-generic   docker://18.9.7
k8-n2       Ready    <none>   144d   v1.17.0   192.168.1.134   <none>        Ubuntu 18.04.3 LTS   4.15.0-72-generic   docker://18.9.7
k8-n3       Ready    <none>   144d   v1.17.0   192.168.1.135   <none>        Ubuntu 18.04.3 LTS   4.15.0-72-generic   docker://18.9.7

I can reach the nodes.

$ ping 192.168.1.133

PING 192.168.1.133 (192.168.1.133) 56(84) bytes of data.
64 bytes from 192.168.1.133: icmp_seq=1 ttl=64 time=0.219 ms
64 bytes from 192.168.1.133: icmp_seq=2 ttl=64 time=0.246 ms
64 bytes from 192.168.1.133: icmp_seq=3 ttl=64 time=0.199 ms
64 bytes from 192.168.1.133: icmp_seq=4 ttl=64 time=0.209 ms
^X^C
--- 192.168.1.133 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3071ms
rtt min/avg/max/mdev = 0.199/0.218/0.246/0.020 ms

$ ping 192.168.1.134

PING 192.168.1.134 (192.168.1.134) 56(84) bytes of data.
64 bytes from 192.168.1.134: icmp_seq=1 ttl=64 time=0.288 ms
64 bytes from 192.168.1.134: icmp_seq=2 ttl=64 time=0.272 ms
64 bytes from 192.168.1.134: icmp_seq=3 ttl=64 time=0.268 ms
^C
--- 192.168.1.134 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2032ms
rtt min/avg/max/mdev = 0.268/0.276/0.288/0.008 ms

$ ping 192.168.1.135

PING 192.168.1.135 (192.168.1.135) 56(84) bytes of data.
64 bytes from 192.168.1.135: icmp_seq=1 ttl=64 time=0.278 ms
64 bytes from 192.168.1.135: icmp_seq=2 ttl=64 time=0.221 ms
64 bytes from 192.168.1.135: icmp_seq=3 ttl=64 time=0.181 ms
^C
--- 192.168.1.135 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2030ms

But I set a nginx 2 pods deployment to test if it worked

nginx-deployment-574b87c764-2gz8t   1/1     Running   0          25m     192.168.2.12   k8-n2   <none>           <none>
nginx-deployment-574b87c764-rst8x   1/1     Running   0          25m     192.168.1.17   k8-n1   <none>           <none>

$ kubectl get svc

NAME               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
kubernetes         ClusterIP   10.96.0.1       <none>        443/TCP        3d17h
nginx-deployment   NodePort    10.96.211.211   <none>        80:31577/TCP   13s

And I can't connect to it.

$ curl k8-n1:31577
curl: (7) Failed to connect to k8-n1 port 31577: Connection refused
$ curl k8-n2:31577
curl: (7) Failed to connect to k8-n2 port 31577: Connection refused
$ curl k8-n3:31577
curl: (7) Failed to connect to k8-n3 port 31577: Connection refused
$ curl 10.96.211.211:80
curl: (7) Failed to connect to 10.96.211.211 port 80: Connection refused
$ curl 192.168.1.17:80
curl: (7) Failed to connect to 192.168.1.17 port 80: No route to host
$ curl 192.168.1.17:31577
curl: (7) Failed to connect to 192.168.1.17 port 31577: No route to host
$ curl 192.168.1.133:31577
curl: (7) Failed to connect to 192.168.1.133 port 31577: Connection refused
$ curl 192.168.1.133:6443
curl: (7) Failed to connect to 192.168.1.133 port 6443: Connection refused

I changed:

sudo kubeadm init --pod-network-cidr=192.168.1.0/16 --apiserver-advertise-address=192.168.1.132

And I changed flannel.yaml network to 192.168.1.0/16 with

kubectl edit cm -n kube-system kube-flannel-cfg

The core-dns pod description after restart:

  Normal   Scheduled  109s                default-scheduler  Successfully assigned kube-system/coredns-6955765f44-vwqgm to k8-n1
  Normal   Pulled     106s                kubelet, k8-n1     Container image "k8s.gcr.io/coredns:1.6.5" already present on machine
  Normal   Created    105s                kubelet, k8-n1     Created container coredns
  Normal   Started    105s                kubelet, k8-n1     Started container coredns
  Warning  Unhealthy  3s (x11 over 103s)  kubelet, k8-n1     Readiness probe failed: Get http://192.168.1.19:8181/ready: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  1s (x5 over 41s)    kubelet, k8-n1     Liveness probe failed: Get http://192.168.1.19:8080/health: dial tcp 192.168.1.19:8080: connect: no route to host
  Normal   Killing    1s                  kubelet, k8-n1     Container coredns failed liveness probe, will be restarted

I would appreciate any help or ask for more information.

Best Answer

While check the question I noticed that OP initialized cluster with CIDR 192.168.1.0/16 which overlapped with this node IP addresses and then cause problem with coreDNS pods.

Initializing cluster with the new different CIDR solved the issue.

Related Topic