Nagios Check Service Frequency Based on Status – Configuration Guide

nagios

Am trying to detect disk thrashing by monitoring si, so from the vmstat command. I am monitoring other services using nagios and service check happens after every 5 minutes. For this thrashing service I want that nagios should check it after every 20 minutes and if the status returned is not OK(ie warning or critical) then thrashing service should be checked after every 3 minutes till the status returned by the service becomes OK. The service check time for all other service remains unchanged.

I am new to Nagios and any help on this would be really appreciated.

Best Answer

Assuming that the interval_length directive is set to 60 by the default:

$ grep interval_length /usr/local/nagios/etc/nagios.cfg 
# This value works of the interval_length you specify later.  If you leave
# actual seconds rather than a multiple of the interval_length variable.
interval_length=60

For the special services, you need to define a different template for it in /usr/local/nagios/etc/objects/templates.cfg:

define service{
        name                            special-service    
        ...
        max_check_attempts              3           
        normal_check_interval           20         
        retry_check_interval            3           
        notification_interval           60   
        ...   
        }

Pay attention to the:

  • normal_check_interval: this service is check every 20 minutes under normal condition
  • retry_check_interval: the number of minutes to wait before scheduling a re-check when service has changed to non-OK state. Notice that if the service has been retried max_attempts time without a change in its status, it will revert to being scheduled at check_interval rate.

and use this template for your service:

define service{
    use                     special-service
    host_name               xx
    service_description     yy
    check_command           zz
    contact_groups          admins
    }

You may also need to define a service escalation to change the notification_interval based on the service state, something like this:

define serviceescalation{
    host_name               xx
    service_description     yy
    last_notification       0
    notification_interval   10
    escalation_options      [w,u,c]
    contact_groups          admins
    }

It means that this service escalation is used when service is in WARNING, UNKNOWN, or CRITICAL state. And you now have a new notification intervals: 10 minutes.