Linux – CentOS reboot apperently with no reason

centoslinux

My server has restarted and I can't find a reason for this…if you have any tips on what things should I search it will be appreciated.

Few info about server:
– CentOS release 5.6 (Final)

What I've done so far:

last reboot | head
reboot system boot 2.6.18-xxx Mon Oct 3 12:32 (00:45)
used sensors to see if the problem was because of high temp but when i checked(~3-5 min after the reboot) temperature was Core 0: +65°C

in /var/log/messages i have no info about reboot …so here are few lines from messages:

Oct  2 20:50:01 p07 auditd[6738]: Audit daemon rotating log files
Oct  3 07:58:14 p07 auditd[6738]: Audit daemon rotating log files
Oct  3 12:32:40 p07 syslogd 1.4.1: restart.
Oct  3 12:32:40 p07 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Oct  3 12:32:40 p07 kernel: Linux version 2.6.18-xxx (root@rhel5-build-x64) #1 SMP Thu Jul 21 19:23:22 MSD 2011
Oct  3 12:32:40 p07 kernel: Command line: ro root=/dev/md2 selinux=0
Oct  3 12:32:40 p07 kernel: BIOS-provided physical RAM map:

Best Answer

try to enter the BMC log and see if there has been a hardware error that caused the reboot (log locations and their interpretation are probably best asked from the HW vendor)
Does the server have a fence device? Any chance it has been fenced?
If you have a smart PDU, there might be logs for power outages in there. If the server is hosted with a managed server farm, I'd ask the NOC team about outages as well

Related Solutions

Linux – Apache server keeps crashing regularly

I'm not saying this is what's happening but based on my own experience as a CentOS admin, it's most likely runaway apache/php processes taking down the server. I've seen this numerous times on CentOS 5. It's frustrating because there's usually not a trace of what happened in the log files. The machine just grinds to a halt due to physical memory and swap being sucked up by apache/php processes. You would think linux memory management or some daemon would jump in and say "hey stop" but it doesn't. It'll let apache grind your system to a halt.

Having said that, to see what's happening you'll need something that can monitor and log resource usage. I like to use a program called atop. Atop is a lot like the top program but it also takes a snapshot of resource usage at defined intervals. It's pretty simple to install.

wget http://www.atcomputing.nl/Tools/atop/packages/atop-1.23.tar.gz 
tar -zxvf atop-1.23.tar.gz
cd atop-1.23 && make install

Open /etc/atop/atop.daily with a text editor and change INTERVAL=600 to INTERVAL=60

Run the command /etc/atop/atop.daily from a command prompt to start it. Wait a few minutes and run atop -r /var/log/atop/atop_20091118 with the correct date of course.

Hit the t key to go forward in time and T to go back. Next time your server crashes do this and check the MEM free and SWP free lines. If you have memory problems these will be in red. Also look for numerous httpd lines under CMD. If apache/php is your problem there'll be a bunch of them.

If this is the case, I recommend looking at you're MaxClients setting in httpd.conf. If set too high, apache will gladly eat all of your memory causing your machine to crash. Apache/php can easily eat 40-50MB/process. If you multiply 40mb x MaxClients you'll get a rough idea of how much memory apache can potentially use. MaxClients usually defaults to 150 on CentOS so apache can potentially use 6GB of memory by default. This doesn't include memory your system needs for itself and other processes to run. Try setting it to a more realistic value based on the amount of memory you have like 40 if you have 2G of memory and see if that helps. Also if you have KeepAlive On, set KeepAliveTimeout to a low number like 2 or 3.

In my opinion CentOS's apache/php compilation is a real pos that should never have seen the light of day. It's buggy and crash prone. If you run a serious site, I highly recommend compiling your own version of apache/php or even using one of the newer high performance webservers like lighttpd or nginx with fgci php.

Linux – CentOS Server won’t reboot when issuing reboot command

You should perform a ps aux to see if any of the shutdown scripts are hung waiting for a process to finish. It should look something like this:

/etc/rc6.d/K##procname

You can try manually issuing a kill command for that hung script. Strange though, since there's a timeout set on the scripts where it will force a -KILL signal to any leftover process.

Also, what's the uptime of the server/box? I've experienced an issue in the past where a box that has an uptime of over a year refuses to shut down. In that case, I've killed each process manually, run sync several times to flush all data to disk and forced a reboot (power cycle).

Best Answer

Related Solutions

Linux – Apache server keeps crashing regularly

Linux – CentOS Server won’t reboot when issuing reboot command

Related Topic