Acknowledge/ignore a nagios alert for X min/hrs/days

monitoringnagios

I have a Nagios service which has correctly noticed a problem. I am running a command on that machine that will fix that problem. However it will take a few hours to run. Until then it is still (correct) detected by Nagios as a problem. I can "acknowledge" the problem and I won't get notifications again, but if there is a problem with my clean up command, I won't know about it unless I remember to recheck.

Is there anyway to "acknowledge" a nagios problem for a certain amount of time? And after that time, if it's still a problem, to send a nagios alert as per normal? Sort of a "Ignore this problem for the next X mins/hours/days"? a "snooze" button for a nagios problem?

Best Answer

Yes, that's called downtime. To quote from the documentation:

When a host or service is in a period of scheduled downtime, Nagios Core will not allow normal notifications to be sent out for the host or service. However, a "DOWNTIMESTART" notification will get sent out for the host or service, which will serve to put any admins on notice that they won't receive upcoming problem alerts.

When the scheduled downtime is over, Nagios Core will allow normal notifications to be sent out for the host or service again. A "DOWNTIMEEND" notification will get sent out notifying admins that the scheduled downtime is over, and they will start receiving normal alerts again.

There are two variants of downtime:

  • Fixed downtimes start and end at the exact times you specify
  • Flexible downtimes start as soon as the service enters a failed state (but after the specified start time) and last for a fixed duration (but not longer than the specified end time)

In this case you would want a fixed downtime with start time now and end time the expected completion of your command.