Linux – Two CPU cores at 100% utilization after soft lockup on two CPUs, is this normal

central-processing-unitcpu-usagehtoplinux

I had two of my CPUs lock up on one of my servers. From dmesg:

BUG: soft lockup - CPU#1 stuck for 23s! [vmx-vcpu-0:6148]

and later:

BUG: soft lockup - CPU#2 stuck for 23s! [vmx-vcpu-0:6148]

I'm trying to figure out why this would happen; the processor has 4 cores with hyperthreading, so the OS sees it as 8 cores. But my main question is related to this:

When looking at htop post-freeze from SSH, I see that CPUs #2 and #3 (guessing these correspond to #1 and #2 from dmesg) are both stuck at 100% with apparently no processes using them:

htop

None of the processes were using more than 5% CPU. Why would these display 100% utilization? Are they still considered locked by the kernel?

Best Answer

As the message reports, this a bug in kernel-level code.

Those CPUs are stuck in a kernel code (vmx-cpu-0) that is not yield()ing control of the CPU for a long period of time.

As far as what to do - open a ticket with VMware. vmx-cpu-0 looks like their code, but I'm not totally sure.