Linux – Nagios – NRPE: Unable to read output even though it seems like everything is configured correctly

I have a Nagios machine which monitors many linux\windows servers.
I started working here about a week ago and got a task to make order in the Nagios.
As part of my order i had to add 2 more linux servers to the Nagios.
I've installed nagios-plugins and nrpe on both machines, in addition, i've verified that port 5666 is opened and listening on both servers and i can see nrpe running in ps -aux |grep nrpe.
The User which is running the plugins is root.
**EDIT: The nrpe is configured to run as a daemon so xinetd doesn't play a role here, in addition, checking /var/log/messages |grep nrpe returns:

Sep 27 12:29:25 search-uk-1 nrpe[11708]: Starting up daemon
Sep 27 12:29:25 search-uk-1 nrpe[11708]: Listening for connections on port 5666
Sep 27 12:29:25 search-uk-1 nrpe[11708]: Allowing connections from: avalon.office.incredimail.com,avalon.qa.incredimail.com,lu2.int.incredimail.com,lu2.ext.incredimail.com,206.82.140.185
Sep 27 12:30:54 search-uk-1 nrpe[11753]: Error: Could not complete SSL handshake. 1
Sep 27 12:37:33 search-uk-1 nrpe[11708]: Caught SIGTERM - shutting down...
Sep 27 12:37:33 search-uk-1 nrpe[11708]: Cannot remove pidfile '/var/run/nrpe.pid' - check your privileges.
Sep 27 12:37:33 search-uk-1 nrpe[11708]: Daemon shutdown
Sep 27 12:37:33 search-uk-1 nrpe[12114]: Starting up daemon

SSL is not enabled on any of the servers which are being monitored correctly through Nagios.
Running the check_nrpe test from the Nagios server itself to the the remote server returns:

[root@lu2 ~]# /usr/lib/nagios/plugins/check_nrpe -H 10.0.80.98 -p 5666
NRPE v2.12
[root@lu2 ~]#

This is the content of /etc/nagios/nrpe.cfg:

log_facility=daemon
pid_file=/var/run/nrpe.pid
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=127.0.0.1

dont_blame_nrpe=0
debug=0
command_timeout=60
connection_timeout=300
command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/mapper/VolGroup-lv_root
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200
include=/etc/nagios/command-im.cfg

I've diff'ed this file with a file from one of the working Nagios monitored hosts and found no difference.
Running the commands manually returns correct values.

None of the services are working on each of the two servers:
enter image description here

Your help is much appriciated.

Best Answer

I see one specific problem: your allowed_hosts in your nrpe.conf (on the client) should be set to the nagios monitor master host IP number. Setting this to local host means that your client and the monitor master are the same host (i.e. local) which is unlikely.

Another situation, sometimes remote plugins do not have encyption enabled, so when nagios issues the command connection to the remote npre fails. You can try checking nrpe with and without SSL encoding through a -n switch to check_nrpe.

In any event use the command, /usr/lib/nagios/plugins/check_nrpe and check the remote host nrpe from the nagios monitor master server. You get a lot of information that way.

EX: /usr/lib/nagios/plugins/check_nrpe -H HOSTNAME

If nrpe is not running on the monitored host, then you're not going to get anything back.

Best Answer

Related Solutions

Nagios with nrpe Service check timed out

Linux – NRPE: Unable to read output

Related Topic