Overriding Nagios hostgroup service with host service

configurationmonitoringnagios

I currently am experimenting with defining a set of services to all hosts in a hostgroup, this is working fine.

My issue is that I then want to be able to override these service definitions which have been included in the host, from the hostgroup. For situations, for example, where one particular Linux Server needs to have it's PING check threshold raised from the default.

So for example, I'd like to have a host in the linux-server host group, which inherits several services (SSH, Disk, PING, etc), but for specific services I want to override with their own unique values, define a service specific to that host, with custom values.

Eg. Define a host in linux-server with a custom PING service definition:

define host {
    use             n1-host
    host_name       server-01
    hostgroups      linux-server
    alias           Test Linux Server
    parents         my-gateway,upstream-gateway
    address         server01.test.com
}

define service {
    use                     generic-service
    host_name               server-01
    service_description     PING
    check_command           check_ping!100.0,5%!400.0,15%
}

Unfortunately right now, even though the host name and service description match that of the group-level PING check, only one PING service is listed for server-01 and this is the group-level PING check, not the host level one.

It does seem to be registering on some level as in my Nagios logs I can see:

Jul 16 19:12:27 localhost nagios: Warning: Duplicate definition found for service 'PING' on host 'server-01'

But ultimately, does not work as if I check the "performance data" of the service check results, I can see in there that the threshold included in the data is that of the group check, not the host check.

  • My understanding, however, is that a change was made around version 3.2.0 to allow host-level services to take precedence of hostgroup-level services. I am currently running 3.4.1, so I would think this should be working.

  • Some links that lead me to believe this feature should already be implemented:

  • Furthermore, I've checked my copy of the xdata/xodtemplate.c file from the 3.4.1 source code, and at a glance it does seem that in the "skip list", that host level checks are meant to take precedence over the hostgroup checks. Although admittedly, my analysis is primitive.

  • I know that it may be possible to exclude certain hosts from a group, but this won't work for me as a hostgroup may have multiple services in it, and I won't want to have all of those services removed from the host.

  • I also find it non-intuitive to have to maintain a list of exclusions separate to where the host itself is actually defined. For example, advice was given in one of the above links (second one) to do the following:

In the service definition add a line under "hostgroup_name":
"host_name !zlinux_hostname"

This will exclude the zlinux host from the service check.

To me, this is not an ideal solution, as we could end up having to make many exceptions and this seems like it would be tough to maintain.

If anyone has any advice on insights on how to get this working, I'd very much appreciate it!

Additional Bits

Currently, I'm defining my group-level PING service like so:

define hostgroup {
   hostgroup_name          linux-server
   name                    Linux Servers
}

define service {
    use                     generic-service
    hostgroup_name          linux-server
    service_description     PING
    check_command           check_ping!100.0,2%!400.0,10%
}

Best Answer

I know it's an old post, but I just ran across this question while I was looking for something else. I'm not much of a nagios expert, but I do love it.

Any check that you place in a hostgroup will apply to any host in that hostgroup (which you knew already). If you create the same check in the host cfg, then it'll override the hostgroup check.

Anyhow, the way I do this is to:

1. Set up a hostgroup cfg file with the checks in it. Here's my basic C: drive space check.

define service{
    use         generic-service
    hostgroup_name      windows-servers
    service_description C: Drive Space
    notification_period     workhours
    check_command       check_nt!USEDDISKSPACE!-l c -w 80 -c 90
    }

2. However, one server runs with much less free space than the norm. So in it's host cfg, I have:

define service{
    use         generic-service
    host_name       ServerName
    service_description C: Drive Space
    check_command       check_nt!USEDDISKSPACE!-l c -w 95 -c 99
    notification_period     workhours
    }

Now the service check is going to alert you for 80% and 90% for all hosts in the hostgroup, except for the host that you added the check with the change values in it.

Arranging it this way allows me to only add custom services, and service checks that aren't the norm in the host definition.

I'm not sure if this is common use or not, but this article blew my mind when it came to setting up the config files. I was already tired of editing these humongous text files, and this just made it so easy.

Anyway, I hope that helps.

Related Topic