Nagios.service start operation timed out. Terminating

centos7nagiossystemd

I have a Centos 7 Box on which I have installed Nagios and then migrate all the config files from an old Centos box.

Everything looks fine, nagios -v doesn't return any error.

However, systemd is unable to start the service and give me a timeout. Find below the result of systemctl -l status nagios.service:

● nagios.service - Nagios Network Monitoring
   Loaded: loaded (/usr/lib/systemd/system/nagios.service; disabled; vendor preset: disabled)
   Active: failed (Result: timeout) since Fri 2016-02-05 10:52:55 CET; 13min ago
     Docs: https://www.nagios.org/documentation/
  Process: 2259 ExecStart=/usr/sbin/nagios -d /etc/nagios/nagios.cfg (code=exited, status=0/SUCCESS)
  Process: 2257 ExecStartPre=/usr/sbin/nagios -v /etc/nagios/nagios.cfg (code=exited, status=0/SUCCESS)

Feb 05 10:52:52 nagios.adflux.net nagios[2261]: SERVICE ALERT: VM-CRO-JIRA2;Drive Space C:;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds
Feb 05 10:52:52 nagios.adflux.net nagios[2261]: SERVICE ALERT: ESXi-ls1;PING;WARNING;SOFT;1;PING WARNING - Packet loss = 33%, RTA = 80.47 ms
Feb 05 10:52:55 nagios.adflux.net systemd[1]: nagios.service start operation timed out. Terminating.
Feb 05 10:52:55 nagios.adflux.net nagios[2261]: Caught SIGTERM, shutting down...
Feb 05 10:52:55 nagios.adflux.net nagios[2268]: Caught SIGTERM, shutting down...
Feb 05 10:52:55 nagios.adflux.net nagios[2261]: Successfully shutdown... (PID=2261)
Feb 05 10:52:55 nagios.adflux.net nagios[2261]: Event broker module 'NERD' deinitialized successfully.
Feb 05 10:52:55 nagios.adflux.net systemd[1]: Failed to start Nagios Network Monitoring.
Feb 05 10:52:55 nagios.adflux.net systemd[1]: Unit nagios.service entered failed state.
Feb 05 10:52:55 nagios.adflux.net systemd[1]: nagios.service failed.

No further error found on the logs (or at least, where I looked, maybe…. most probably I'm missing something here).

Running the command /sbin/nagios /etc/nagios/nagios.cfgstart the monitoring service and everything run as expected. But this doesn't solve my issue since Nagios isn't started as a daemon here and is link to my shell. This indicate me that the issue is not caused by Nagios but by systemd itself.

Any clue on that will be appreciate.

Many thanks.

Best Answer

It seems like Nagios is not properly forking into background -d option, what systemd expects here due to type=forking.

So systemd counts a non-fork as a timeout during start. That might be related due to NERD, or another problem.

You could run Nagios in foreground by:

cp /usr/lib/systemd/system/nagios.service /etc/systemd/system/nagios.service
vim /etc/systemd/system/nagios.service
# remove Type=forking and -d in cmd line of nagios
systemctl daemon-reload
systemctl restart nagios.service

Nevertheless, there is a bug...