Gcloud – Node-Pool Creation Hangs in Auto-Repair

clustergoogle-cloud-platformgoogle-kubernetes-enginekubernetes

Adding a new node-pool to an existing cluster is failing with no nodes registering.

The command used to add the node-pool is as follows (project name changed):

gcloud container --project my-project node-pools create hm-pool --cluster ds-cluster-west4 --zone europe-west4-c --node-version 1.16.9-gke.2 --machine-type n1-highmem-4 --image-type COS --disk-type pd-standard --disk-size 100 --metadata disable-legacy-endpoints=true --scopes logging-write,monitoring,pubsub,service-control,service-management,storage-full,taskqueue,trace --num-nodes 2 --enable-autoupgrade --enable-autorepair --max-surge-upgrade 1 --max-unavailable-upgrade 0

I am getting the following error messages

This will enable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
Creating node pool hm-pool...done.
ERROR: (gcloud.container.node-pools.create) Operation [<Operation
 clusterConditions: [<StatusCondition
 message: 'All cluster resources were brought up, but: only 0 nodes out of 2 have registered; cluster may be unhealthy.'>]
 detail: 'All cluster resources were brought up, but: only 0 nodes out of 2 have registered; cluster may be unhealthy.'
 endTime: '2020-06-04T15:17:05.810921209Z'
 name: 'operation-1591282299021-26295b28'
 nodepoolConditions: []
 operationType: OperationTypeValueValuesEnum(CREATE_NODE_POOL, 7)
 selfLink: 'https://container.googleapis.com/v1/projects/473462597806/zones/europe-west4-c/operations/operation-1591282299021-26295b28'
 startTime: '2020-06-04T14:51:39.021046271Z'
 status: StatusValueValuesEnum(DONE, 3)
 statusMessage: 'All cluster resources were brought up, but: only 0 nodes out of 2 have registered; cluster may be unhealthy.'
 targetLink: 'https://container.googleapis.com/v1/projects/473462597806/zones/europe-west4-c/clusters/ds-cluster-west4/nodePools/hm-pool'
 zone: 'europe-west4-c'>] finished with error: All cluster resources were brought up, but: only 0 nodes out of 2 have registered; cluster may be unhealthy.

On the console I see the message "Auto-repairing nodes in node pool hm-pool." and I see that hm-pool is updating. There are 0 nodes in the pool.

What am I doing wrong?

Best Answer

The problem was that the tags applied to the cluster were missing from the specification of the new node pool. I extracted the tags information from one of the existing nodes using gcloud compute instances describe --format="value[delimiter=','](tags.items) INSTANCE-NAME and used the output as argument to the --tags option of the node-pools create command. The node pool was then created successfully.