Master fails after adding a second master

kubernetes

Runs under Virtualbox.
5 machines: 2 workers nodes, but I don't even get to that point.
1x load balancer, ubuntu running haproxy, on 192.168.20.10, configured like so:

frontend kubernetes-frontend
        bind 0.0.0.0:6443
        mode tcp
        option tcplog
        default_backend kubernetes-backend

    backend kubernetes-backend
        mode tcp
        option tcplog
        option tcp-check
        balance roundrobin
        default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 weight 100
        server kubernetes-master-1 192.168.20.21:6443 check
        server kubernetes-master-2 192.168.20.22:6443 check

2x master node, complete replicas. kubeadm v 1.19.4, docker 19.03. crio 1.17,Kubernetes v1.19.4.

kubernetes-master-1 192.168.20.21

kubernetes-master-2 192.168.20.22

Running init command

sudo kubeadm init --control-plane-endpoint="192.168.20.10:6443" --upload-certs --apiserver-advertise-address=192.168.20.21 --pod-network-cidr=10.100.0.0/16

succeeds with

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of the control-plane node running the following command on each as root:

  kubeadm join 192.168.20.10:6443 --token c2p4af.9s3aapujrfjkjlho \
    --discovery-token-ca-cert-hash sha256:ff3fc8d5e1a7ee16e2d48362cef4e3fa53df4c8fd672e69c8fe2c9e5826ab0c9 \
    --control-plane --certificate-key 57d92a387afbd601fba5da9e310523fa5ac8dfcdf0fd70dd8624a9950ce06457

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.20.10:6443 --token c2p4af.9s3aapujrfjkjlho \
    --discovery-token-ca-cert-hash sha256:ff3fc8d5e1a7ee16e2d48362cef4e3fa53df4c8fd672e69c8fe2c9e5826ab0c9 

(full output here)

So far, so good, but when I'm running the join command on master2, upon getting to

[etcd] Creating static Pod manifest for "etcd"

(full output here) it outputs one more line,
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
and then

[kubelet-check] Initial timeout of 40s passed.

and that's all. master-1 (which responded before) is responding to a

kubectl cluster-info

like this:

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Error from server: etcdserver: request timed out

the suggested command returns the folling output:

kubectl cluster-info dump
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get nodes)

That's it. it doesn't matter if I'm installing network before or not (I'm using calico), I get the same results.
Same image works with single master, I can add nodes and run commands. But this, no matter which guide I'm following always fails.
I've checked the etcd (on master 1), and it was (or is) running before the join is executed on master-2. It's also listening on the right address (192.168.20.21) and not localhost.

Any help will be great!
Thank!

Best Answer

OK! So this was all fixed by adding --apiserver-advertise-address=192.168.20.22 to the second master. Boy. So when you use you join command on the secondaries server(s), make sure you add

--apiserver-advertise-address=

and the address of that server, Not the first master, but this master.

Related Topic