Nagios ‘exclude’ directive does not exclude timeframes

monitoringnagios

I tried to set up my nagios install to check the availability of a SMTP service outside its backup hours. Unfortunately, it seems to be ignoring this configuration:

# a timeperiod to check only outside of zimbra's backup hour
# (combining the first with the second timeperiod)
define timeperiod {
    timeperiod_name     zimbra-backups
    alias               When zimbra is being backed up
    sunday              04:00-05:00
    monday              04:00-05:00
    tuesday             04:00-05:00
    wednesday           04:00-05:00
    thursday            04:00-05:00
    friday              04:00-05:00
    saturday            04:00-05:00
}
define timeperiod {
    timeperiod_name     always-except-zimbra-backups
    alias               24x7 except backup time
        sunday          00:00-24:00
        monday          00:00-24:00
        tuesday         00:00-24:00
        wednesday       00:00-24:00
        thursday        00:00-24:00
        friday          00:00-24:00
        saturday        00:00-24:00
    exclude             zimbra-backups
}

Which, then, is used with a new host:

define host {
    host_name               mailserver-except-backups
    alias                   mail server (outside backup hours)
    address                 yaddayadda
    notification_options        d,u,r,f
    use                     my-default-host
    check_period            always-except-zimbra-backups
    }
define service {
    host_name                   mailserver-except-backups
    service_description         SMTP service
    check_command               check_smtp!-t 30
    use                         my-default-service
    check_interval              2
    retry_interval              1
    }

I can't see what's wrong.. any clue?
Here is one of the notification e-mail

***** Nagios *****

Notification Type: PROBLEM

Service: SMTP service
Host: mail server (outside backup hours)
Address: yaddayadda
State: CRITICAL

Date/Time: Sat Apr 27 04:03:16 CEST 2013

Additional Info:

Connection refused

Nagios is Core 3.3.1 running on a OpenBSD 5.2.

Best Answer

Host checks and service checks are almost entirely unrelated, except for an implicit dependency of the service on its associated host.

You have configured the host check for your custom time period, but the service check is using whichever check_period is defined in the template it's using. Add a check_period to your service definition to fix this.

Alternatively, you could use your custom time period as a notification_period for the service, if you just want to suppress notifications during the backup.

Also, not that exclusions might be broken in 3.3.x, as noted in this line from the Nagios version history under 3.2.0:

Known issue: Service checks that are defined with timeperiods that contain "exclude" directives are incorrectly re-scheduled. Don't use these for now - we'll get this fixed for 3.4

... so you might want to upgrade to 3.4.x or 3.5.0 (latest as of this writing).