Centos – Server Crashes Every 2 Weeks Days On The Exact Same Time

centosMySQL

A few months ago our server started to crash every 14 days, at the exact same time(around 11:04 every time). We're pretty sure this isn't some kind of hardware failure, as hardware failures tend to be random.

The server just stops responding in a sudden and reboots itself after a few seconds. None of the logs contain any related info and we're 100% sure there is no cron on the server that could cause this.

Has anyone ever faced this kind of problem? We're extremely frustrated about this wired behavior since there's not even a single clue of what's wrong…

I've also taken a video right before the server crashes, as you can see from it, nothing seemed wrong…

Update 11-Apr-2011 :

2 Weeks ago :
In order to narrow down possibilities, the server was shutdown (shutdown
-h now) 5 minutes before the next occurrence. And magically, the server booted by itself at the expected time. After that our DC moved the server to another PDU port, we thought that would finally solve our issue.

Today : The server still crashed, at the exact same time!! Our DC said other servers on the same PDU does not have this issue. Now we're really confused, if its not the PDU nor our server, what could it be?

Best Answer

From the video it seems like a cold reboot. And as you said, nothing in the logs. All I can think of is the sysrq "magic" key or a faulty kvm card if no other servers using the same UPS are experiencing the same.

A bugged/faulty system monitoring process could be doing this at specific days/hours. This should be fun to track down.

First step would be to change the date and time of the OS and see if it reboots on its own so you can narrow it down.

Related Topic