I'm trying to stand up a pair of kubernetes workers on EC2 instances, and running into a problem where the service does not appear to "see" all of the pods that it should be able to see.
My exact environment is a pair of AWS Snowballs, Red and Blue, and my cluster looks like control
, worker-red
, and worker-blue
[1]. I'm deploying a dummy python server that waits for a GET on port 8080, and replies with the local hostname. I've set it up with enough replicas that both worker-red
and worker-blue
have at least one pod each. Finally, I've created a service, the spec of which looks like
spec:
type: NodePort
selector:
app: hello-server
ports:
- port: 8080
targetPort: 8080
nodePort: 30080
I can now check that my pods are up
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hello-world-deployment-587468bdb7-hf4dq 1/1 Running 0 27m 192.168.1.116 worker.red <none> <none>
hello-world-deployment-587468bdb7-mclhm 1/1 Running 0 27m 192.168.1.126 worker.blue <none> <none>
Now I can try to curl them
curl worker-red:30080
greetings from hello-world-deployment-587468bdb7-hf4dq
curl worker-blue:30080
greetings from hello-world-deployment-587468bdb7-mclhm
That's what happens about half the time. The other half of the time, the curl fails with a timeout error. Specifically – curling worker-red will ONLY yield a response from hf4dq, and curling worker-blue will ONLY yield a response from mclhm. If I cordon and drain worker-blue so both of my pods are running on worker-red, there is never a timeout, and both pods will respond.
It seems like the NodePort service is not reaching pods that are not on the host I am curling. As I understand them, this isn't how services are supposed to work. What am I missing?
[1] If I set up such that I have two workers both on Red, the same problem I'm describing happens, but this is my primary use case so it's the one I'll concentrate on.
Best Answer
It is hard to simply say what might be wrong here but there are some steps you can take in order to troubleshoot your issue:
kubectl logs ${POD_NAME} ${CONTAINER_NAME}
kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}
Does the Service exist?
Does the Service work by DNS name?
Does the Service work by IP?
Is the Service defined correctly?
Does the Service have any Endpoints?
Is the kube-proxy working?
Going through those steps will help you find the cause of your issue and also better understand the mechanics behind the services.