I'm monitoring a few Linux servers using NRPE and most of the checks are working.
Actually, the only check which doesn't work is check_disk
.
Running a remote check_disk
from the Nagios server terminal,:
[root@nagios]# /usr/lib64/nagios/plugins/check_nrpe -H 10.200.X.X -c check_disk -a '-w 20% -c 10% /'
DISK OK - free space: / 271971 MB (97% inode=99%);| /=8321MB;236233;265762;0;295292
Running a local check_disk
from the monitored server's terminal:
[root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H 127.0.0.1 -c check_disk -a '-w 20% -c 10% /'
DISK OK - free space: / 271971 MB (97% inode=99%);| /=8321MB;236233;265762;0;295292
This check_disk
command returns error "CHECK_NRPE: Error receiving data from daemon"
on each server it checks, which leads me to believe it's a problem with the way the service or the command are written, so here they are:
The command from the /etc/nagios/nrpe.cfg
file:
[root@Monitored ~]# grep disk /etc/nagios/nrpe.cfg
command[check_local_disk]=sudo /usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
The command from the /etc/nagios/commands.cfg
file:
# 'check_local_disk' command definition
define command{
command_name check_local_disk
command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
}
The service from services.cfg
file:
define service{
servicegroups Basic Functionality
hostgroup_name db_hosts,vm_hosts,linux_hosts
host_name localhost
service_description Check free disk space /
check_command check_nrpe!check_local_disk!20%!10%!/
use generic-service
}
Here's the command definition of check_nrpe
:
define command{
command_name check_nrpe
command_line /usr/lib64/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
Here's the information regarding NRPE from /var/log/messages
on the monitored server:
Jun 10 12:57:01 virt2 nrpe[755]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Jun 10 12:57:01 virt2 nrpe[756]: Starting up daemon
Jun 10 12:57:01 virt2 nrpe[756]: Server listening on 0.0.0.0 port 5666.
Jun 10 12:57:01 virt2 nrpe[756]: Server listening on :: port 5666.
Jun 10 12:57:01 virt2 nrpe[756]: Warning: Daemon is configured to accept command arguments from clients!
Jun 10 12:57:01 virt2 nrpe[756]: Listening for connections on port 0
Jun 10 12:57:01 virt2 nrpe[756]: Allowing connections from: 127.0.0.1,10.200.X.X
Do you have any idea how to solve this issue?
Thanks in advance
Best Answer
The data you show us are self-inconsistent.
On
nagios
, you show yourself invoking the check on servermonitored
by using check_nrpe like so:[root@nagios]# /usr/lib64/nagios/plugins/check_nrpe -H 10.200.X.X -c check_disk -a '-w 20% -c 10% /'
but when you show us
monitored
'snrpe.cfg
file, the check is defined by a different name:command[check_local_disk]=sudo /usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
(I apologise for the lack of formatting, but I wanted the ability to highlight more than I wanted a monospaced font). It seems to me that the order of the parameters is also wrong, but I'm no expert on nrpe configurations that permit arguments to be passed.
At any rate, the
-c foo
argument tocheck_nrpe
must match thecommand[foo]
in thenrpe.cfg
, and it doesn't. That can only mean that either something you've showed us isn't so, or you're proving that you can invokecheck_nrpe
against the wrong server.Edit: I think I've already been fairly clear about trying to solve it. You say the GUI doesn't run the check correctly. OK. So you're right that the normal next step is to run it from the command line, but it's quite important to run the same check against the same client. You've shown us the client's
nrpe.cfg
, so assuming that the client really is10.200.X.X
, from the server show us the results ofIt would be useful to see the
check_nrpe
entry from the server'scommands.cfg
file, as well, just to confirm that all lines up. We will now be trying to do what the server's doing, so if the command above fails, we can debug and fix the failure. If it succeeds, we have to drill down a little harder.