Nagios Plugin – Take Process Snapshot When Load is High

linuxmonitoringnagiosperformance-monitoring

We have configured Nagios with check_load via NRPE plugin to monitor server load, it reports when load is high, but does not have option to take a snapshot top processes (like top command) at that time.

Are there any nagios NRPE plug-ins for that?

Best Answer

You can do it with event handlers.

First, add an event handler for your Load average definition:

define service{
    use                     generic-service
    host_name               xx
    service_description     Load_Average
    check_command           check_nrpe!check_load
    event_handler           processes_snapshot!xx
    contact_groups          admin-sms
}

The processes_snapshot command is defined in commands.cfg:

define command{
    command_name    processes_snapshot
    command_line    $USER1$/eventhandlers/processes_snapshot.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$
}

And second, write an event handler script (processes_snapshot.sh):

#!/bin/bash

case "$1" in
    OK)
        ;;
    WARNING)
        /usr/local/nagios/libexec/check_nrpe -H $4 -c processes_snapshot
        ;;
    UNKNOWN)
        ;;
    CRITICAL)
        /usr/local/nagios/libexec/check_nrpe -H $4 -c processes_snapshot
        ;;
esac

exit 0

The command processes_snapshot is defined in nrpe.cfg on the xx host as belows:

command[processes_snapshot]=top -cSbn 1 | tail -n +8 | sort -rn -k11 | head > /tmp/proc_snap.txt

PS: I haven't tested this config.