The correct way to stop the nagios ndo2db daemon

nagios

What is the correct way to kill Nagios's ndo2db daemon?

When I shutdown nagios and ndo2db I do the following:

/etc/init.d/nagios stop
/etc/init.d/ndo2db stop

and I see the following in nagios.log:

[1311865619] Caught SIGTERM, shutting down...
[1311865619] Successfully shutdown... (PID=12422)
[1311865619] ndomod: Shutdown complete.
[1311865619] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.

However /etc/init.d/ndo2db stop outputs the following message to the console:

ndo2db was not running… could not stop

If I do ps -ax | grep nagios I still see a ndo2db process running:

12381 ? Ss 0:00 /usr/local/nagios//bin/ndo2db -c /usr/local/nagios//etc/ndo2db.cfg

Which I then have to manually kill before restarting ndo2db, otherwise I get:

[root@nag01 nagios]# /etc/init.d/ndo2db start
Starting ndo2db:Could not bind socket: Address already in use
 done.
[root@nag01 nagios]#

Is there a cleaner way of doing this?

I'm running:

  • Nagios 3.2.3 built from source
  • NDO Utils 1.4b9 built from source
  • Centreon 2.2.1 Stable
  • Centos 5.5 x64
  • MySQL 5.5 x64

Update:

One of the odd(?) this I've noticed is that when ndo2db and nagios are running I see two instances of ndo2db:

12753 ? Ss 0:00 /usr/local/nagios//bin/ndo2db -c /usr/local/nagios//etc/ndo2db.cfg
12792 ? S  0:00 /usr/local/nagios//bin/ndo2db -c /usr/local/nagios//etc/ndo2db.cfg

Is this normal? If so then my guess is that the stop part of the init.d script is only killing one process?

Best Answer

I found the culprit - it was Centreon's config file builder.

ndo2db has a lock_file setting which is missing from the Centreon config UI.

When Centreon generates the config files it also generates ndo2db.cfg - but without the lock_file configuration value.

There's an open issue about this:

No lock_file field in Configuration - Centreon - ndo2db.cfg

Having spelunked the source code, when ndo2db daemonises and if there isn't a lock_file setting then it ignores this and carries on and no lock file is written containing the PID.

This of course means that the stop function in the init script won't be able to identify the ndo2db process id so it can be killed.

Update:

To resolve this issue I manually added a new column to the cfg_ndo2db table in the centreon database:

ALTER TABLE `cfg_ndo2db`
    ADD COLUMN `lock_file` VARCHAR(255) NULL DEFAULT NULL;

I then populated it with the path of my ndo2db lock file:

UPDATE `cfg_ndo2db` SET `lock_file`='/usr/local/nagios/var/ndo2db.lock' WHERE `id`=1;

This will force centreon to write the lock_file setting each time the config is generated. This also appears to survive upgrades as well, though I'd always check the database upgrade scripts to ensure this doesn't sneak in as an undocumented fix.