Linux server stopped responding

linux

One of our RedHat Linux server simply stopped responding for a few minutes. For that period of a few minutes there is absolutely no entry in the log files (under /var/log/ – messages etc) or application log files. What else could I check ?

For that period the users could not get to the application and I could not ssh to it. Cannot recall whether I tried to ping.

After that everything started working as expected !

Best Answer

Do you have any sort of trending or monitoring running against this box? If not, it may be very difficult to diagnose. This behavior could be caused by any number of things. Here are a few ideas off the top of my head:

  • transient network glitch (broadcast storm, routing loop, spanning tree topology change, etc)
  • IO Contention (did something consume all the RAM of the server, causing it to go heavily into swap land?)
  • did the server reboot?

Going forward, I'd highly recommend getting something like Munin set up. With Munin, you'll be able to easily keep tabs on disk IO, memory usage, CPU usage, process count, network traffic, etc. Having this information makes it much easier to troubleshoot this sort of problem. Alternatively, you can install and set up sar, which gathers much of the same data, but logs it in text files, which you can inspect after the fact.

Related Topic