Linux – What steps should I take to determine the root cause of linux server failure

linuxtroubleshooting

I am sorry if this question has been addressed before, I am assuming it has, but after a half hour of searching I couldn't find anything.

Anyway, to the question:

I am a windows guy and a self-taught programmer so I am very new to linux but am liking it more than Windows. We have a small WordPress installation that fails seemingly at random. When I does I cannot SSH in and my only real option is to do a hard reboot from the Rackspace Cloud admin. It has always fixed the problem.

I want to know what I should be doing to determine what actually caused the problem though. This is a trivial example but we are planning on putting more applications on linux in the next year or so and I want to get to the point that I am comfortable dealing with problems in a more scientific way than "unplug it and plug it back in."

Where should I get started? I am open to books, blog posts, server fault questions, videos, seminars, college classes, anything.

Thanks!

Best Answer

This is a general recipe, it works not only on linux:

Identifing problems, in order:

remote login problems:
1. network problems
2. remote login daemon problems (sometime it can take minutes to login with ssh)
load problems (uptime;df -h;free -m)
read the logs (they are in /var/log/. System wide logs are /var/log/messages, /var/log/syslog. In your case, you could be interested in /var/log/apache)

If you hard rebooted your server, be careful to write down the time you did it. So you could check the logs just before that time.

Related Solutions

Email delivery is slow between us and counterparty. What can I do to determine root cause

Your SMTP logs should give you the information you need. They should tell you:

If you are greylisted (some mailsystems require you to keep trying for a period of time to discourage spammers)
If you are on an RBL (or at least if the receiving server recognizes it)
If the receiving mailserver is too busy and is delaying reception
How many attempts the sending server makes, and how often (perhaps you are only attempting to re-send every 20 mins?
the exact time your mail was received by the receiving server

This should be all you need to determine the source of the delay.

How to Secure Tomcat 6.x – Essential Steps

You can install Tomcat 6 to run under jsvc as user tomcat (not as root). Here's what I did last time I set it up:

I installed the Tomcat application under /usr/java/tomcat (CATALINA_HOME) and an instance under /var/lib/tomcat (CATALINA_BASE):

cd /usr/java
sudo tar xzvf ~/downloads/apache-tomcat-6.0.18.tar.gz
sudo ln -s apache-tomcat-6.0.18 tomcat
sudo /usr/sbin/useradd -d /var/lib/tomcat -c "Apache Tomcat" -m -s /sbin/nologin tomcat
cd /var/lib/tomcat
sudo mkdir logs work temp
sudo chown tomcat:tomcat logs temp work
(cd /usr/java/tomcat && sudo tar cvf - conf webapps) | sudo tar xvf -
sudo chmod -R g+rw webapps conf
sudo chown -R tomcat:tomcat webapps conf
cd webapps/
sudo rm -rf docs examples manager host-manager
cd ../conf
sudo chmod g+r *

Then I built the jsvc wrapper:

cd
tar xzvf downloads/apache-tomcat-6.0.18.tar.gz
tar xzvf apache-tomcat-6.0.18/bin/jsvc.tar.gz
cd jsvc-src
chmod +x configure
./configure --with-java=$JAVA_HOME
make
./jsvc --help
sudo cp jsvc /usr/local/sbin/

Finally, I tightened the permissions on the instance directories:

cd /var/lib/tomcat
sudo chmod -R 0700 conf
sudo chmod -R 0750 logs
sudo chmod -R 0700 temp
sudo chmod -R 0700 work
sudo chmod -R 0770 webapps/
sudo chown -R tomcat:tomcat conf
sudo chown -R tomcat:tomcat logs

When you run Tomcat now, you'll need to start it using jsvc, so add this script as /etc/init.d/tomcat and symlink it appropriately:

#!/bin/sh
#
# tomcat       Startup script for the Apache Tomcat Server running under jsvc
#
# chkconfig: 345 85 15
# description: Apache Tomcat
# pidfile: /var/run/jsvc.pid

JAVA_HOME=/usr/java/jdk1.6.0_13
CATALINA_HOME=/usr/java/apache-tomcat-6.0.18
CATALINA_BASE=/var/lib/tomcat
JAVA_OPTS="-Djava.awt.headless=true"
JMX_OPTS="-Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"

DAEMON_APP=/usr/local/sbin/jsvc
TOMCAT_USER=tomcat

# Everything below should be okay
PID_FILE=/var/run/jsvc.pid
LOCK_FILE=/var/lock/tomcat

PATH=/sbin:/bin:/usr/bin
. /lib/init/vars.sh

. /lib/lsb/init-functions

[ -x $JAVA_HOME/bin/java ] || exit 0
[ -x $DAEMON_APP ] || exit 0
[ -d $CATALINA_HOME/bin ] || exit 0
[ -d $CATALINA_BASE ] || exit 0

RETVAL=0
prog="jsvc"

CLASSPATH=\
$JAVA_HOME/lib/tools.jar:\
$CATALINA_HOME/bin/commons-daemon.jar:\
$CATALINA_HOME/bin/bootstrap.jar

start() {
  # Start Tomcat
  log_daemon_msg "Starting Apache Tomcat"
  $DAEMON_APP \
    -user $TOMCAT_USER \
    -home $JAVA_HOME \
    -wait 10 \
    -pidfile $PID_FILE \
    -outfile $CATALINA_BASE/logs/catalina.out \
    -errfile $CATALINA_BASE/logs/catalina.out \
    $JAVA_OPTS $JMX_OPTS \
    -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager \
    -Djava.util.logging.config.file=$CATALINA_BASE/conf/logging.properties \
    -Dcatalina.home=$CATALINA_HOME \
    -Dcatalina.base=$CATALINA_BASE \
    -Djava.io.tmpdir=$CATALINA_BASE/temp \
    -cp $CLASSPATH \
    org.apache.catalina.startup.Bootstrap start 2>/dev/null 1>&2
  RETVAL=$?
  if [ 0 -eq $RETVAL ]; then
    touch $LOCK_FILE
    log_end_msg 0
  else
    log_end_msg 1
  fi
}

stop() {
  # Stop tomcat
  log_daemon_msg "Stopping Apache Tomcat"
  $DAEMON_APP \
    -stop \
    -pidfile $PID_FILE \
    org.apache.catalina.startup.Bootstrap 2>/dev/null 1>&2
  RETVAL=$?
  if [ 0 -eq $RETVAL ]; then
    rm -rf $LOCK_FILE
    log_end_msg 0
  else
    log_end_msg 1
  fi
}

restart() {
  stop
  sleep 5
  start
}

# See how we were called.
case "$1" in
  start)
    start
    ;;
  stop)
    stop
    ;;
  restart)
    restart
    ;;
  status)
    status $prog
    ;;
  condrestart)
    [ -f $LOCK_FILE ] && restart || :
    ;;
  *)
    log_action_msg "Usage: $0 {start|stop|restart|status|condrestart}"
    exit 1
esac

exit $?

Best Answer

Related Solutions

Email delivery is slow between us and counterparty. What can I do to determine root cause

How to Secure Tomcat 6.x – Essential Steps

Related Topic