Linux – How to find out, why a server hangs, but is still reachable with ping

debianlinuxserver-crashes

One of my servers, which runs in a german data center "hangs" every night, but i cant find out why. No errors are found in the /var/log/messages and /var/log/syslog.

The server responds to ping, but all services are down (ssh, apache, …). After a reset everything runs normal.

A hardware test has been performed. It looks like being a software issue.

Best Answer

I'd leave some light profiling commands logging to files, so you can get an inside look on what went wrong after the fact. For example:

nohup top -b -d 60 >> top.log & # runs every 60 seconds
nohup vmstat 5 >> vmstat.log &
nohup iostat 5 >> iostat.log &

nohup is there so they aren't killed when you lose connection to the server. You can also use screen for that.

A more robust alternative to the last two commands would be to setup sar.