The correct way to stop the nagios ndo2db daemon

nagios

What is the correct way to kill Nagios's ndo2db daemon?

When I shutdown nagios and ndo2db I do the following:

/etc/init.d/nagios stop
/etc/init.d/ndo2db stop

and I see the following in nagios.log:

[1311865619] Caught SIGTERM, shutting down...
[1311865619] Successfully shutdown... (PID=12422)
[1311865619] ndomod: Shutdown complete.
[1311865619] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.

However /etc/init.d/ndo2db stop outputs the following message to the console:

ndo2db was not running… could not stop

If I do ps -ax | grep nagios I still see a ndo2db process running:

12381 ? Ss 0:00 /usr/local/nagios//bin/ndo2db -c /usr/local/nagios//etc/ndo2db.cfg

Which I then have to manually kill before restarting ndo2db, otherwise I get:

[root@nag01 nagios]# /etc/init.d/ndo2db start
Starting ndo2db:Could not bind socket: Address already in use
 done.
[root@nag01 nagios]#

Is there a cleaner way of doing this?

I'm running:

Nagios 3.2.3 built from source
NDO Utils 1.4b9 built from source
Centreon 2.2.1 Stable
Centos 5.5 x64
MySQL 5.5 x64

Update:

One of the odd(?) this I've noticed is that when ndo2db and nagios are running I see two instances of ndo2db:

12753 ? Ss 0:00 /usr/local/nagios//bin/ndo2db -c /usr/local/nagios//etc/ndo2db.cfg
12792 ? S  0:00 /usr/local/nagios//bin/ndo2db -c /usr/local/nagios//etc/ndo2db.cfg

Is this normal? If so then my guess is that the stop part of the init.d script is only killing one process?

Best Answer

I found the culprit - it was Centreon's config file builder.

ndo2db has a lock_file setting which is missing from the Centreon config UI.

When Centreon generates the config files it also generates ndo2db.cfg - but without the lock_file configuration value.

There's an open issue about this:

No lock_file field in Configuration - Centreon - ndo2db.cfg

Having spelunked the source code, when ndo2db daemonises and if there isn't a lock_file setting then it ignores this and carries on and no lock file is written containing the PID.

This of course means that the stop function in the init script won't be able to identify the ndo2db process id so it can be killed.

Update:

To resolve this issue I manually added a new column to the cfg_ndo2db table in the centreon database:

ALTER TABLE `cfg_ndo2db`
    ADD COLUMN `lock_file` VARCHAR(255) NULL DEFAULT NULL;

I then populated it with the path of my ndo2db lock file:

UPDATE `cfg_ndo2db` SET `lock_file`='/usr/local/nagios/var/ndo2db.lock' WHERE `id`=1;

This will force centreon to write the lock_file setting each time the config is generated. This also appears to survive upgrades as well, though I'd always check the database upgrade scripts to ensure this doesn't sneak in as an undocumented fix.

Related Solutions

Cannot get check_uptime plugin for Nagios to work. Unknown status

Nver mind. I figured it out.

Looks like I have to invoke check_nrpe in /usr/local/nagios/etc/objects/hosts/machine1.cfg as follows:

    define service{
        use                             generic-service
        host_name                       machine1
        service_description             Uptime
        check_command                   check_nrpe!check_uptime
}

Edit: It also looks that the check_uptime command should not be invoked with the -H $HOSTADDRESS$ parameter in commands.cfg on the server end. Once that is removed, it will work on the server.

Hope this helps somebody else.

Nagios NRPE check_procs reporting incorrect number

I had a similar problem. check_procs is internally calling /bin/ps axwo 'stat uid pid ppid vsz rss pcpu comm args', it's listing the processes and then counting them. If you have configured nagios to run with a different user, it's using sudo to execute the command. And here is the problem. If you type sudo ps -AF | grep sudo, some distribution return "grep sudo", others return "sudo ps -AF" and "grep sudo". As check_procs is counting all processes, you will get different results on different machines. Unfortunately I do not have a solution yet how to force check_procs to not count sudo processes.

Best Answer

Related Solutions

Cannot get check_uptime plugin for Nagios to work. Unknown status

Nagios NRPE check_procs reporting incorrect number

Related Topic