Nagios remote monitoring: NRPE Vs. SSH

monitoringnagiosnrpeperformance

We use Nagios to monitor quite a few (~130) servers. We monitor CPU, Disk, RAM and a few other things on each server. I've always used SSH to run the remote commands, purely because it requires little to no additional config on the remote server, just install nagios-plugins, create the nagios user and add the SSH key, all of which I've automated into a shell script. I've never actually considered the performance implications of using SSH over NRPE.

I'm not too bothered about the load hit on the Nagios server (It's probably over-speced for what it does, it's never been over 10% CPU), but we run each remote check every 30 seconds and each server has 5 different checks performed. I assume SSH requires more resources for each check but is there a huge difference? (I.E. enough of a difference to warrant the switch to NRPE).

If it's any help, we monitor a mix of physical servers (Normally with 8, 12 or 16 physical cores) and Amazon EC2 medium/large instances.

Best Answer

I've always believed the administration advantage of SSH (I use push_check) outweighs any additional load. Modern CPUs are so fast that the cost of encrypting a handful of bytes is pretty minimal, so it comes down to running two processes (SSH and the check script) vs one (check script fired off by NRPE).

For check scripts written in an interpreted language, I would expect the overhead of firing up the interpreter (Perl, Python, Bash) to exceed the CPU cost of starting an SSH session. Given modern CPUs, your machines are more likely to be disk or memory limited rather than CPU limited.

Provided your Nagios machine is coping -- it has to set up 20 SSH connections every second -- I would err on the side of convenience.

Not really an answer to your question, more of an argument that life is too short to worry about it :)

Related Solutions

Nagios with nrpe Service check timed out

Ok I finally got it working.

The problem was that both nagios servers are performing service checks and reporting results to master node, and all those check were performed perfectly. Master node had service freshness checking so if the monitoring servers could not complete checks master server would scheduled those checks from itself.

Anyways, new servers were on new ip range and by default nrpe port was closed on master server.

Opening the port solved the problem. Although it's still odd that it returned "Service check timed out" instead of "Socket timeout error".

NRPE unable to read output, but why

You have a rights problem.

Change the command to:

command[check_openmanage]=sudo /usr/lib/nagios/plugins/additional/check_openmanage -s -e -b ctrl_driver=0 bat_charge

(add sudo)

Then, add the nagios-user to the sudoers:

nagios ALL=(ALL) NOPASSWD:/usr/lib/nagios/plugins/additional/check_openmanage

Or you could just chmod the file... That also works.

If you are using CentOS, Red Hat, Scientific or Fedora, make sure to disable Defaults requiretty in the sudoers file.

Best Answer

Related Solutions

Nagios with nrpe Service check timed out

NRPE unable to read output, but why

Related Topic