NRPE: “CHECK_NRPE: Error receiving data from daemon.” error in the gui, but the check works from terminal, what could be the cause

centos6nagiosnrpe

I'm monitoring a few Linux servers using NRPE and most of the checks are working.
Actually, the only check which doesn't work is check_disk.
Running a remote check_disk from the Nagios server terminal,:

[root@nagios]# /usr/lib64/nagios/plugins/check_nrpe -H 10.200.X.X -c check_disk -a '-w 20% -c 10% /'
DISK OK - free space: / 271971 MB (97% inode=99%);| /=8321MB;236233;265762;0;295292

Running a local check_disk from the monitored server's terminal:

[root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H 127.0.0.1 -c check_disk -a '-w 20% -c 10% /'
DISK OK - free space: / 271971 MB (97% inode=99%);| /=8321MB;236233;265762;0;295292

This check_disk command returns error "CHECK_NRPE: Error receiving data from daemon" on each server it checks, which leads me to believe it's a problem with the way the service or the command are written, so here they are:
The command from the /etc/nagios/nrpe.cfg file:

[root@Monitored ~]# grep disk /etc/nagios/nrpe.cfg 
command[check_local_disk]=sudo /usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$

The command from the /etc/nagios/commands.cfg file:

# 'check_local_disk' command definition
define command{
        command_name    check_local_disk
        command_line    $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
        }

The service from services.cfg file:

define service{
   servicegroups        Basic Functionality
   hostgroup_name       db_hosts,vm_hosts,linux_hosts
   host_name            localhost
   service_description  Check free disk space /
   check_command        check_nrpe!check_local_disk!20%!10%!/
   use                  generic-service
}

Here's the command definition of check_nrpe:

define command{
        command_name    check_nrpe
        command_line    /usr/lib64/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

Here's the information regarding NRPE from /var/log/messages on the monitored server:

Jun 10 12:57:01 virt2 nrpe[755]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Jun 10 12:57:01 virt2 nrpe[756]: Starting up daemon
Jun 10 12:57:01 virt2 nrpe[756]: Server listening on 0.0.0.0 port 5666.
Jun 10 12:57:01 virt2 nrpe[756]: Server listening on :: port 5666.
Jun 10 12:57:01 virt2 nrpe[756]: Warning: Daemon is configured to accept command arguments from clients!
Jun 10 12:57:01 virt2 nrpe[756]: Listening for connections on port 0
Jun 10 12:57:01 virt2 nrpe[756]: Allowing connections from: 127.0.0.1,10.200.X.X

Do you have any idea how to solve this issue?
Thanks in advance

Best Answer

The data you show us are self-inconsistent.

On nagios, you show yourself invoking the check on server monitored by using check_nrpe like so:

[root@nagios]# /usr/lib64/nagios/plugins/check_nrpe -H 10.200.X.X -c check_disk -a '-w 20% -c 10% /'

but when you show us monitored's nrpe.cfg file, the check is defined by a different name:

command[check_local_disk]=sudo /usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$

(I apologise for the lack of formatting, but I wanted the ability to highlight more than I wanted a monospaced font). It seems to me that the order of the parameters is also wrong, but I'm no expert on nrpe configurations that permit arguments to be passed.

At any rate, the -c foo argument to check_nrpe must match the command[foo] in the nrpe.cfg, and it doesn't. That can only mean that either something you've showed us isn't so, or you're proving that you can invoke check_nrpe against the wrong server.

Edit: I think I've already been fairly clear about trying to solve it. You say the GUI doesn't run the check correctly. OK. So you're right that the normal next step is to run it from the command line, but it's quite important to run the same check against the same client. You've shown us the client's nrpe.cfg, so assuming that the client really is 10.200.X.X, from the server show us the results of

[root@nagios]# /usr/lib64/nagios/plugins/check_nrpe -H 10.200.X.X -c check_local_disk -a '-w 20% -c 10% /'

It would be useful to see the check_nrpe entry from the server's commands.cfg file, as well, just to confirm that all lines up. We will now be trying to do what the server's doing, so if the command above fails, we can debug and fix the failure. If it succeeds, we have to drill down a little harder.