Nagios escalations, prematurely critical escalation after warning

monitoringnagios

In Nagios 3, I would like a service to be escalated after being critical XX minutes. It works great on services that go from UP to CRITICAL. However, if the service has been warning >XX minutes (say for disk space that slowly is going up), and goes critical, the very first failure is triggering an escalation. It is counting the warnings to the escalation count, where as we want it to fail after 3 CRITICAL alarms, not 3 warnings and one critical.

Is there a solution that will allow me to ignore the warnings from counting towards the service check escalation?

Here's an example of another user with the same problem – and very similar configs. http://copilotco.com/mail-archives/nagios-users.2009/msg00310.html)

Best Answer

As i don't use escalations in my Nagios implementation i will speak blindly, just regarding the documentation for Service Escalation definition.

You may have to consider the first_notification directive :

first_notification: This directive is a number that identifies the first notification for which this escalation is effective. For instance, if you set this value to 3, this escalation will only be used if the service is in a non-OK state long enough for a third notification to go out.

And also consider the escalation_options directive :

escalation_options: This directive is used to define the criteria that determine when this service escalation is used. The escalation is used only if the service is in one of the states specified in this directive. If this directive is not specified in a service escalation, the escalation is considered to be valid during all service states. Valid options are a combination of one or more of the following: r = escalate on an OK (recovery) state, w = escalate on a WARNING state, u = escalate on an UNKNOWN state, and c = escalate on a CRITICAL state. Example: If you specify w in this field, the escalation will only be used if the service is in a WARNING state.

So, to achieve what you want (escalation after 3 CRITICAL alarms), i would try a definition like this :

define serviceescalation{
    host_name              myhost
    service_description    Disk Usage
    first_notification     3
    last_notification      0
    notification_interval  10
    contact_groups         admins
    escalation_options     c,r
    }

Hope it will help...and work...!

Related Topic