High CPU Temperature and Temperature Difference – Troubleshooting Guide

cpu-usagelm-sensorsUbuntu

On a cluster I am working on there is a node which is showing high CPU temperature.

The node has 2 Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz.

The sensors command from lm-sensors is showing that one CPU is at around 70°C and the other at 90°C. The load is 100%. It is in fact overloaded but the load can not be reduced. The temperature is highly correlated with the load.
The current frequency is higher than the max frequency. max : 2400000 cur: 5280000
So I do not think that there is throttling.

Is the temperature diffrence a sign of cooling issues ?

The intel documentation is showing that the temperature case is 86°C from what I understand it means that the lifespan of the CPU at 90°C will decrease.

It is almost a week with these temperatures, should I look into solution (reduce CPU speed) to decrease the temperature of the CPU ? The node will probably run other intensive CPU jobs in the future.

Best Answer

Running a CPU at those temperatures is within specs, but it will most likely degrade the longevity of your components. You should definitely look into scaling up, both horizontally and vertically, to reduce the load. If on-premise you could also check if there are more efficient cooling options.

Related Topic