Pods stuck in ‘Pending’, no events being logged

google-cloud-platformgoogle-kubernetes-enginekubernetes

I don't know where to look for hints.

We have installed gitlab-runners using a helm chart in our development cluster. Most of the time this works, but in the last week or so we have experienced pods being stuck in Pending state without any further logs. At some point which I cannot define better, all pods are being scheduled on nodes, then the next batch is stuck in Pending again.

We use GKE and have set up a node pool of preemtible nodes only for gitlab-runner pods. We run kubernetes v1.15.4-gke.18.

We know there are several reasons for pods being stuck in Pending but I always expect some form of logs/indication when running kubectl describe <GITLAB_RUNNER_POD> or kubectl get events. The problem is, there is none. No events.

We have stackdriver logging enabled and I can see Kubernetes Apiservice Requests logs under Kubernetes Cluster but they don't have any meaningful content to me.

Any ideas where to look?

Best Answer

Posting this answer to give more of a general idea for where to look for information why Pod is in Pending state as for now it's impossible to tell on this specific setup.

The ways to check why the Pod can be in Pending state:

$ kubectl describe pod POD_NAME
$ kubectl get events -A
Inspecting the Cloud Logging (more on that below)

Assuming following situation where the Pod is in Pending state:

$ kubectl get pods

NAME                           READY   STATUS    RESTARTS   AGE
nginx-four-99d88fccb-rwzmp     0/1     Pending   0          2s
nginx-one-8584c66446-h92rm     1/1     Running   0          5d22h
nginx-three-5bcb988986-tmshp   1/1     Running   0          5d22h
nginx-two-6c9545d7d4-2zlmh     1/1     Running   0          5d22h

To get more information about it's state you can run:

$ kubectl describe pod POD_NAME

The Event part of above output:

Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  26s (x2 over 114s)  default-scheduler  0/1 nodes are available: 1 Insufficient cpu.

As you can see there is an information on why the Pod is in Pending state (Insufficient CPU).

You can also run:

$ kubectl get events

LAST SEEN   TYPE      REASON              OBJECT                            MESSAGE
20s         Warning   FailedScheduling    pod/nginx-four-99d88fccb-rwzmp    0/1 nodes are available: 1 Insufficient cpu.
14m         Normal    SuccessfulCreate    replicaset/nginx-four-99d88fccb   Created pod: nginx-four-99d88fccb-rwzmp
14m         Normal    ScalingReplicaSet   deployment/nginx-four             Scaled up replica set nginx-four-99d88fccb to 1

Disclaimer!

Kubernetes events are stored in the etcd for the 1 hour. If the message of Pod state was not repeating over time, it will be deleted after 1 hour. Additional reference on this particular topic:

Github.com: Kubernetes: Issues: Events will disappear after one hour

Retrieving logs from Cloud Logging:

You can run below query to get the Pods that were in Pending state:

resource.type="k8s_cluster"
resource.labels.cluster_name="gke-serverfault"
protoPayload.response.status.phase="Pending"

This query will not show the reason (like Insufficient CPU) of why Pod is in Pending state. There is a feature request on Issuetracker.google.com for this reason. You can follow it to receive further updates:

Issuetracker.google.com: Issue: GKE Cloud Logging Reason for Pod pending state

Additional resources:

Best Answer

Related Solutions

Can’t see pod logs in Stackdriver UI for cluster deployed on GKE

Stackdriver missing GKE Logs

Related Topic