Linux – How to troubleshoot an unexpected Linux shutdown

linuxredhatshutdown

I have two RHEL6.3 servers that just shutdown at the same exact time (25 seconds apart) for no apparent reason. They're on conditioned power, along with a number of other servers that didn't shut down, so it can't be the power. The room is properly chilled and them both shutting off at the exact same time due to temperature seems unlikely.

At the time of shutdown, both servers have the following in their /var/log/secure. I don't know what it means but found it peculiar.

Apr 10:42:52 localhost polkitd(authority=local): Unregistered Authentication Agent for session /org/freedesktop/ConsoleKit/Session1 (system bus name :1.25, object path /org/gnome/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)

In /var/log/messages, both systems have what seems like they received a clean shutdown request

Apr 10 10:42:52 localhost init: tty (/dev/tty2) main process (6183) killed by TERM signal
Apr 10 10:42:52 localhost init: tty (/dev/tty3) main process (6186) killed by TERM signal
Apr 10 10:42:52 localhost init: tty (/dev/tty4) main process (6188) killed by TERM signal
Apr 10 10:42:52 localhost init: tty (/dev/tty5) main process (6190) killed by TERM signal
Apr 10 10:42:52 localhost init: tty (/dev/tty6) main process (6192) killed by TERM signal

So I checked last to see if anyone logged in to do that and both have this entry with no one logging in prior for days.

reboot     system boot   2.6.32-279.el6.x Thu Apr 10 10:42 - 10:42  (00:00)

So if no one logged in to shut it down and I had two guys that were there witness the servers shutdown and they confirmed no one touched either of the servers, what else could cause this shutdown? Where else should I look for clues?

Best Answer

Assuming this is a kernel oops (kernel panic), you need to capture the output from the server console to understand what exactly happened. You can do so using:

Things to double check on all servers to be successful:

  • Check kernel.panic option in /etc/sysrq.conf, that controls the timeout after which Linux will reboot after kernel panic
  • Check the kernel log level via kernel.printk in /etc/sysrq.conf, recommended with more debug would be: echo 'kernel.printk = 8 4 1 7' >> /etc/sysctl.conf