Kubernetes services timing out on accessing pods on different workers

amazon-web-serviceskubeadmkubernetes

I'm trying to stand up a pair of kubernetes workers on EC2 instances, and running into a problem where the service does not appear to "see" all of the pods that it should be able to see.

My exact environment is a pair of AWS Snowballs, Red and Blue, and my cluster looks like control, worker-red, and worker-blue [1]. I'm deploying a dummy python server that waits for a GET on port 8080, and replies with the local hostname. I've set it up with enough replicas that both worker-red and worker-blue have at least one pod each. Finally, I've created a service, the spec of which looks like

spec:
    type: NodePort
    selector:
        app: hello-server
    ports:
        - port: 8080
          targetPort: 8080
          nodePort: 30080

I can now check that my pods are up

kubectl get pods -o wide
NAME                                      READY   STATUS    RESTARTS   AGE   IP              NODE          NOMINATED NODE   READINESS GATES
hello-world-deployment-587468bdb7-hf4dq   1/1     Running   0          27m   192.168.1.116   worker.red    <none>           <none>
hello-world-deployment-587468bdb7-mclhm   1/1     Running   0          27m   192.168.1.126   worker.blue   <none>           <none>

Now I can try to curl them

curl worker-red:30080
greetings from hello-world-deployment-587468bdb7-hf4dq

curl worker-blue:30080
greetings from hello-world-deployment-587468bdb7-mclhm

That's what happens about half the time. The other half of the time, the curl fails with a timeout error. Specifically – curling worker-red will ONLY yield a response from hf4dq, and curling worker-blue will ONLY yield a response from mclhm. If I cordon and drain worker-blue so both of my pods are running on worker-red, there is never a timeout, and both pods will respond.

It seems like the NodePort service is not reaching pods that are not on the host I am curling. As I understand them, this isn't how services are supposed to work. What am I missing?

[1] If I set up such that I have two workers both on Red, the same problem I'm describing happens, but this is my primary use case so it's the one I'll concentrate on.

Best Answer

It is hard to simply say what might be wrong here but there are some steps you can take in order to troubleshoot your issue:

Debug Pods, especially check if there is something suspicious in the logs:

kubectl logs ${POD_NAME} ${CONTAINER_NAME}
kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}

Debug Services, for example by checking:

Does the Service exist?
Does the Service work by DNS name?
Does the Service work by IP?
Is the Service defined correctly?
Does the Service have any Endpoints?
Is the kube-proxy working?

Going through those steps will help you find the cause of your issue and also better understand the mechanics behind the services.

Related Solutions

Docker – Kubernetes cluster internal routing not working (NodePort service)

Kubernetes requires more than just the nodes being able to talk to each other. It also requires a network (or routing table) so pods can talk to each other. It's essentially another network just for the pods (often called an overlay/underlay network) that allows pod on nodeA to talk to pods on nodeB.

From the looks of it you don't have pod networking set up. You can implement overlay networking a multitude of ways (which is one reason it's so confusing). Read more about the networking requirements here.

With only 2 nodes I would recommend you actually set up what I like to call "no SDN Kubernetes" and just manually add pod routes to each node. It would require you to do 2 things.

Specify the subnet for pods on each node
Manually run a command to create the route

I have details on how to do it on my blog post I wrote about the subject.

Unfortunately, setting up the pod networking is only going to get you 1/2 of the way there. In order to implement automatic NodePort services you also need to install the kube-proxy. The job of the kube-proxy is to watch for what port a service starts on and then route that port to the correct service/pod inside the cluster. It does this via IP tables and is mostly automatic.

I couldn't find a very good example of deploying kube-proxy manually (usually it's handled via your deployment tool) Here's an example of a DaemonSet the kubeadm tool should automatically create to run the kube-proxy on every node in the cluster.

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  generation: 1
  labels:
    component: kube-proxy
    k8s-app: kube-proxy
    kubernetes.io/cluster-service: "true"
    name: kube-proxy
    tier: node
  name: kube-proxy
  namespace: kube-system
spec:
  selector:
    matchLabels:
      component: kube-proxy
      k8s-app: kube-proxy
      kubernetes.io/cluster-service: "true"
      name: kube-proxy
      tier: node
  template:
    metadata:
      labels:
        component: kube-proxy
        k8s-app: kube-proxy
        kubernetes.io/cluster-service: "true"
        name: kube-proxy
        tier: node
    spec:
      containers:
      - command:
        - kube-proxy
        - --kubeconfig=/run/kubeconfig
        image: gcr.io/google_containers/kube-proxy-amd64:v1.5.2
        imagePullPolicy: IfNotPresent
        name: kube-proxy
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        volumeMounts:
        - mountPath: /var/run/dbus
          name: dbus
        - mountPath: /run/kubeconfig
          name: kubeconfig
      dnsPolicy: ClusterFirst
      hostNetwork: true
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      volumes:
      - hostPath:
          path: /etc/kubernetes/kubelet.conf
        name: kubeconfig
      - hostPath:
          path: /var/run/dbus
        name: dbus

One other resources that might be useful to go through is Kubernetes the Hard Way. It's not directly applicable to running in VMs on proxmox (it assumes GCE or AWS) but it shows you the bare minimum steps and resources needed to run a functioning Kubernetes cluster.

Kubernetes’ pods logfiles

You can see the log file path using:

docker inspect --format='{{.LogPath}}' $INSTANCE_ID

Best Answer

Related Solutions

Docker – Kubernetes cluster internal routing not working (NodePort service)

Kubernetes’ pods logfiles

Related Topic