Munin and Nagios are really different tools.
From the official Munin website:
Munin is a networked resource monitoring tool that can help analyze
resource trends and "what just happened to kill our performance?"
problems. It is designed to be very plug and play. A default
installation provides a lot of graphs with almost no work.
Nagios is a monitoring (alerting) tool. Munin could be considered a replacement for Cacti.
We use both of them: Nagios and Munin.
- Nagios tell us in real time if something is wrong: like web server down, database load average, etc.
- Using Munin you can see the trends and the history about why that happenend.
There are multiple ways to solve this. You can have a secondary server with just nrpe running. In this way it's acting as a proxy. So the main nagios sends a check through the server running nrpe. Example:
From the main nagios server:
check_nrpe -H NRPEPROXYHOST -c check_ping -H 10.0.0.3 ....
The NRPEPROXYHOST runs the command as if it were the nagios server and submits the results back to the main server. In this setup the secondary server does not run nagios or any bloated daemons. Just the nrpe daemon, the nagios plugins to be ran. This can even be configured on some sort of gateway device and would not necessarily require a dedicated server be deployed.
======
Method 2 would be configuring a second instance of Nagios at the site and having it perform the active checks and submit the results to the main Nagios server. The main nagios server would have all the checks configured with active checks disabled and passive checks enabled.
This configuration is a true distributed Nagios as documented on their site. It's quite a bit more robust so if you see yourself having to perform several hundred or thousands checks to these server (every 5 minutes) then this is your best choice. In most instances the secondary server is called a "satelite" nagios instance and the results are usually submitted to the main Nagios server via the NSCA protocol (which is encrypted). The Main nagios server listens for these via the nsca daemon and submits them to the external command file for processing by nagios.
The downside is you have to have the config files on two servers and make changes to both sets of configs. You have to have these hosts as passive on the main server and active checks on the satelite server.
This is scalable to no end and the preferred solution for installations with tens of thousands of service checks to be performed. Also, look at building the configs on a central server and keeping them in revision control and have a script on the nagios server periodically checkout the new configs and reload nagios.
=====
Method 3
DNX, http://dnx.sourceforge.net/ an awesome project that patches Nagios so that it can send checks to be performed to "node" nagios servers. To the best of my knowledge though this configuration does not allow you to pick and choose which checks are executed by which node (node affinity), or if they are NOT to be executed by a node. So this solution adds distribution more than it does a proxy into a secondary network.
Best Answer
Yes. Nagios has support for custom scripts and checks, better integration and more granular notification options. Monit is good for basic system checks and daemons monitoring. Nagios is more flexible, but is also more involved to install.
I find that Monit is good for single-host installations, but multi-Monit (M/Monit) really isn't that good of a central console solution. It's good for up/down views at a glance, but falls over with a larger number of hosts or when you need to monitor systems over the WAN. The interface is also too simple.
I find that multiple tools are often needed to provide a good view of an environment. Monit is great for making sure things are running. It's simple enough to get running and can alert if a process that should be present isn't. Think of ntp, sshd, crond, etc. Use Monit to take corrective actions based on that.
My approach over the past few years has required Monit for daemon and custom application monitoring via PID file. Observium or ORCA for graphing and trends analysis. OpenNMS for up/down and notifications. I've yet to find a suite that does it all cleanly. Observium doen't do alerts. Orca is graphing-only, OpenNMS has great notifications and thresholding, but ugly graphs.
I don't use Nagios because of the setup involved and my familiarity with other tools. I've inherited a few Nagios installations that went awry because of poor implementation. I find that OpenNMS + Monit + an RRDTool-based graphing solution work better for me.