Distribute Nagios to reduce false alarms


I'm currently running a single Nagios instance. From time to time, I'm getting false alarms about timeouts – for example, it says that HTTP is down on some server, but when I open it in my browser several seconds later, it loads fast, and in general there is no trace of an error.

What can I do to reduce such false alarms?

I'm guessing that it's because of transient network issues on my monitoring server. I guess that setting up another monitoring server on a different network would greatly help, but how do I plug it into Nagios?

Is it at all possible with Nagios or do I have to switch to another monitoring system? I like my configs and, if possible, I'd like to stay with Nagios or something compatible (Icinga?)

Best Answer

Increase the threshold for alerting. For example, don't have it alarm after 1 failure. Have it alarm after 3 failures and put a sane interval (1 minute, 2 minutes) between re-checks. This means that you'll be notified if it's down for 4-5 minutes, not if you have "transient network issues" on your monitoring server.

Related Topic