Nagios remote monitoring: NRPE Vs. SSH

monitoringnagiosnrpeperformance

We use Nagios to monitor quite a few (~130) servers. We monitor CPU, Disk, RAM and a few other things on each server. I've always used SSH to run the remote commands, purely because it requires little to no additional config on the remote server, just install nagios-plugins, create the nagios user and add the SSH key, all of which I've automated into a shell script. I've never actually considered the performance implications of using SSH over NRPE.

I'm not too bothered about the load hit on the Nagios server (It's probably over-speced for what it does, it's never been over 10% CPU), but we run each remote check every 30 seconds and each server has 5 different checks performed. I assume SSH requires more resources for each check but is there a huge difference? (I.E. enough of a difference to warrant the switch to NRPE).

If it's any help, we monitor a mix of physical servers (Normally with 8, 12 or 16 physical cores) and Amazon EC2 medium/large instances.

Best Answer

I've always believed the administration advantage of SSH (I use push_check) outweighs any additional load. Modern CPUs are so fast that the cost of encrypting a handful of bytes is pretty minimal, so it comes down to running two processes (SSH and the check script) vs one (check script fired off by NRPE).

For check scripts written in an interpreted language, I would expect the overhead of firing up the interpreter (Perl, Python, Bash) to exceed the CPU cost of starting an SSH session. Given modern CPUs, your machines are more likely to be disk or memory limited rather than CPU limited.

Provided your Nagios machine is coping -- it has to set up 20 SSH connections every second -- I would err on the side of convenience.

Not really an answer to your question, more of an argument that life is too short to worry about it :)