Google Kubernetes Engine – Node Pool Autoscaling Issues

google-kubernetes-enginegraphics-processing-unitkubernetesnvidia

I am trying to run a machine learning job on GKE, and need to use a GPU.

I created a node pool with Tesla K80, as described in this walkthrough.

I set the minimum node size to 0, and hoped that the autoscaler would automatically determine how many nodes I needed based on my jobs:

gcloud container node-pools create [POOL_NAME] \
--accelerator type=nvidia-tesla-k80,count=1 --zone [COMPUTE_ZONE] \
--cluster [CLUSTER_NAME] --num-nodes 3 --min-nodes 0 --max-nodes 5 \
--enable-autoscaling

Initially, there are no jobs that require GPUs, so the cluster autoscaler correctly downsizes the node pool to 0.

However, when I create job with the following specification

resources:
  requests:
    nvidia.com/gpu: "1"
  limits:
    nvidia.com/gpu: "1"

Here is the full job configuration. (Please note that this configuration is partially auto-generated. I have also removed some environment variables that are not pertinent to the issue).

the pod is stuck pending with Insufficient nvidia.com/gpu until I manually increase the node pool to at least 1 node.

Is this a current limitation of GPU node pools, or did I overlook something?

Best Answer

Autoscaler supports scaling GPU nodepools (including to and from 0).

One possible reason for this problem is if you have enabled Node Auto-Provisioning and set resouce limits (via UI or gcloud flags such as --max-cpu, max-memory, etc). Those limits apply to ALL autoscaling in the cluster, including nodepools you created manually with enabled autoscaling (see note in documentation: https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-provisioning#resource_limits).

In particular if you have enabled NAP and you want to autoscale nodepools with GPUs you need to set resouce limits for GPUs as described in https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-provisioning#gpu_limits.

Finally, autoprovisioning also supports GPUs, so (assuming you set the resource limits as described above) you don't actually need to create nodepool for your GPU workload - NAP will create one for you automatically.

===

Also, for future reference - if autoscaler fails to create nodes for some of your pods, you can try to debug it using autoscaler events:

  • On your pod (kubectl describe pod <your-pod>) there should be one of the 2 events (it may take a minute until they show up):
    • TriggeredScaleUp - this mean the autoscaler decided to add a node for this pod.
    • NotTriggerScaleUp - autoscaler spotted your pod, but it doesn't think any nodepool can be scaled up to help it. In 1.12 and later the event contains a list of reasons why adding nodes to different nodepools wouldn't help the pod. This is usually the most useful event for debugging.
  • kubectl get events -n kube-system | grep cluster-autoscaler will give you events describing all autoscaler actions (scale-up, scale-down). If a scale-up was attempted, but failed for whatever reason it will also have events describing that.

Note that events are only available in Kubernetes for 1 hour after they were created. You can see historical events in Stackdriver by going to UI and navigating to Stackdriver->Logging->Logs and choosing "GKE Cluster Operations" in drop-down.

Finally you can check the current status of autoscaler by running kubectl get configmap cluster-autoscaler-status -o yaml -n kube-system.

Related Topic