When do Dell PowerEdge servers (R210II and R620) automatically shutdown due to over heating

automationdell-poweredgedracphysical-environment

I've had a hell of a time trying to find out when and how a Dell PowerEdge server (in my case we have a bunch of R210IIs and R620s with iDRACs) deals with overheating. I don't want to wait for the CPU to self-preserve, and ideally the server itself should deal with high temps over a period of time by issuing a self IPMI command to the OS to power down before a critical threshold is reached. e.g. at 55C, issue the IPMI command to the OS, if the server reaches 80C, pull the plug, etc…

The problem is that all of Dell's documentation is unclear on when or how a server shutdown from overheating occurs.

My question is if Dell supports thermal management graceful shutdown like this, or it's some fine print or unclear documentation on the critical temperature where it'll simply pull it's own plug? Is Dell OpenManage necessary to support this?

I really would like to avoid having to run a dedicated management server plugged into the various networks (trying to avoid bridging between networks through a single management point) to remotely manage shutdown like this. It would be a single point of failure which is also subject to the same hardcoded or inflexible thermal conditions as my servers themselves.

My R620s have iDRACs in them. I included them for the iDRAC's remote management features, but at this point I'm disappointed the iDRAC is incapable of handling this. It's thermal settings are limited to controlling fan speeds and the horrible documentation and in system help doesn't actually say when shutdown could occur.

Any real world advice is greatly appreciated! Thank you.

Best Answer

The best I could find was from a thread on Spiceworks forums. The response is from a Dell representative:

There are a lot of ways to do this. You are correct that by default none of the options for a graceful shutdown are enabled, but the server will shut down if a critical threshold is met.

You can set alert actions within the iDRAC/CMC. You can set it to power off when a temperature warning or critical threshold is met. You can also set platform events or alert actions within OMSA. There is also a section in OMSA under shutdown for thermal. You can set it to perform an action there as well. Also, you can configure OMSA to execute a program if an event is triggered. You can use that feature to execute the shutdown program within Windows.

The Power Off option in the alert actions is a graceful shutdown. I recommend that you set it to shutdown on the warning threshold. If you configure it for the critical threshold it may attempt a graceful shutdown and then hit the critical limit and perform a hard shutdown before a graceful shutdown can be completed.

I also read an Official Dell PDF regarding OpenManage with this mention of thermal shutdown:

Dell OpenManage Server Administrator (OMSA) enables administrators to set temperature thresholds at which servers should perform an emergency thermal shutdown.

So the answer appears to be Yes, Dell servers do support graceful thermal shutdown and that temperature is configurable. You can use the OpenManage Server Administrator on each server to make these changes (I believe you can make these changes while the server is running). You should not need to install a centralized OpenManage management server, though it can simplify a lot of other management tasks.

:EDIT:
I should append that these answers are generic for Dell servers. I did not find anything specific to the server models you listed.