Kubernetes Node Pool – Autoscale to 0 Nodes Issue

autoscalinggoogle-kubernetes-enginekubernetes

I have a rather expensive workload that some colleagues need running sometimes during the weekday (not on any sort of set schedule). I use Google Cloud Kubernetes.

It consists of three statefulsets, each with one replica.

I've instructed them how to turn it "on" and "off." To turn it "on," they scale each statefulset to 1 replica. To turn it "off," they scale each statefulset to 0 replicas.

Originally, I had a single autoscaling node pool with a default size of three nodes (the statefulsets each consume almost an entire node's worth of CPU and RAM). I observed that even after scaling down to 0, at least one (and sometimes two) nodes would remain after an hour or two. I was expecting that eventually all the nodes would die down, but that doesn't happen.

I noticed that the running nodes still had some pods, just in a different namespace. The remaining pods are all in the kube-system namespace, except for one in the custom-metrics namespace.

So then I thought, okay – maybe there are other services Kubernetes wants to run even when there are no user-defined workloads/pods. So I created another node pool, with a single very-small-but-adequate node. That node is big enough to run everything that Kubernetes reports is running in those non-default namespaces.

After the new node pool was running with one node, I then proceeded to manually resize the original node pool to 0. It was fine. I hoped at this point that I had a "system" node pool for running kube-system and other stuff, and a "user" node pool for running my own stuff.

So for my next test, this time I only scaled up one statefulset replica. Eventually a node came online and the statefulset pod was running/ready. I then scaled it down to 0 again and waited… and waited… and the node did not go away.

What does it take to make the autoscaling node pool actually reach 0 nodes? Clearly I am missing something (or more than something), but I have had a hard time finding information about what is necessary to trigger the node scaler to downsize a node pool to 0.

Any advice is appreciated.

Additional info

When I look at what's running on the node in the node pool I want to go to 0, here's what I see

  Namespace                  Name                                                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                                   ------------  ----------  ---------------  -------------  ---
  kube-system                fluentd-gcp-v3.1.1-mfkxf                               100m (0%)     1 (3%)      200Mi (0%)       500Mi (0%)     28m
  kube-system                kube-proxy-gke-tileperformance-pool-1-14d3671d-jl76    100m (0%)     0 (0%)      0 (0%)           0 (0%)         28m
  kube-system                prometheus-to-sd-htvnw                                 1m (0%)       3m (0%)     20Mi (0%)        20Mi (0%)      28m

If I try to drain the node it complains that they are managed via DaemonSet, so I could force it but obviously I am trying to not have to manually intervene in any way.

Hack

To get the autoscaler to "work" and downsize to 0, I've temporarily added a nodeSelector to all the kube-system deployments so they are assigned to a separate pool for kube-system stuff. But there has to be a better way, right?

Best Answer

On GKE 1.18, my experiments show that I'd have to add a node taint in order to make the node pool able to shrink to zero:

$ gcloud container node-pools create ... \
      --min-nodes 0 \
      --max-nodes 2 \
      --node-taints=...  # Without a taint, my node pool won't scale down to zero somehow.
Related Topic