Kubernetes Pod fails with OutOfMemory status immediately after being scheduled

kubernetes

I'm testing my application on a bare-metal Kubernetes cluster (version 1.22.1) and having an issue when launching my application as a Job.

My cluster has two nodes (master and worker) but the worker is cordoned. On the master node, 21GB of memory is available for the application.

I tried to launch my application as three different Jobs at the same time. Since I set 16GB of memory as both resource request and limit, only a single Job was started and the remaining two are in a Pending state. I have set backoffLimit: 0 to the Jobs.

NAME            READY   STATUS     RESTARTS   AGE
app1--1-8pp6l   0/1     Pending    0          42s
app2--1-42ssl   0/1     Pending    0          45s
app3--1-gxgwr   0/1     Running    0          46s

After the first Pod completes, only one of two Pods in a Pending state should be started. However, one was started, and the other one was in an OutOfMemory status even though no container has been started in the Pod.

NAME            READY   STATUS        RESTARTS   AGE
app1--1-8pp6l   0/1     Running       0          90s
app2--1-42ssl   0/1     OutOfmemory   0          93s
app3--1-gxgwr   0/1     Completed     0          94s

The events of the OutOfMemory Pod is as follows:

Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  3m41s (x2 over 5m2s)  default-scheduler  0/2 nodes are available: 1 Insufficient memory, 1 node(s) were unschedulable.
  Normal   Scheduled         3m38s                 default-scheduler  Successfully assigned test/app2--1-42ssl to master
  Warning  OutOfmemory       3m38s                 kubelet            Node didn't have enough resource: memory, requested: 16000000000, used: 31946743808, capacity: 37634150400

It seems that the Pod is assigned to the node even though there is not enough space for it as the other Pod has just been started.

I guess this isn't an expected behavior of Kubernetes, does anyone know the cause of this issue?

Best Answer

It's known issue for the 1.22.x versions - you can find a multiple GitHub and Stackoverflow topics about this, for example:

The fix for the issue is included in the 1.23 version:

  • Fix a regression where the Kubelet failed to exclude already completed pods from calculations about how many resources it was currently using when deciding whether to allow more pods. (#104577, @smarterclayton)

So please just upgrade your Kubernetes cluster to the newest stable version.

I hope it will help you, but keep in mind another similar issue is open on the Github even with the fix applied (mentioned here about 10 days ago - state for 13 January 2022):

Linking here for completeness - a similar symptom might get exposed after this fix as described in #106884. The kubelet considers resources for terminating pods to be in use (they are!), but the scheduler ignores terminating pods and schedules new pods. Because the kubelet now considers terminating pods, it rejects those rapidly rescheduled pods.

Then, probably the only solution is to downgrade to the 1.21 version.

Related Topic