Nagios service running, web site error: Could not read host and service status information

nagios

Nagios was upgraded from 3.5.1 to 4.0.8

I wanted to ask this in the nagios support forum, but an hour later, I do not receive a confirmation email for setting up my account…

nagios seems to run OK as a service, but the web CGIs are not working, and there are no errors in error.log for apache, nor nagios.log. I've checked permission, and looked at some of the C code having this embedded error:

Whoops!
Error: Could not read host and service status information!

The same error above appears for almost every menu on the left side of the main page for nagios.

nagios.log looks like this on start and then stopping, from the init:

[1431102009] Nagios 4.0.8 starting... (PID=27779)
[1431102009] Local time is Fri May 08 13:20:09 ADT 2015
[1431102009] LOG VERSION: 2.0
[1431102009] qh: Socket '/usr/local/nagios/var/rw/query.sh' successfully initialized
[1431102009] qh: core query handler registered
[1431102009] nerd: Channel hostchecks registered successfully
[1431102009] nerd: Channel servicechecks registered successfully
[1431102009] nerd: Channel opathchecks registered successfully
[1431102009] nerd: Fully initialized and ready to rock!
[1431102009] wproc: Successfully registered manager as @wproc with query handler
[1431102009] wproc: Registry request: name=Core Worker 27785;pid=27785
[1431102009] wproc: Registry request: name=Core Worker 27786;pid=27786
[1431102009] wproc: Registry request: name=Core Worker 27782;pid=27782
[1431102009] wproc: Registry request: name=Core Worker 27781;pid=27781
[1431102009] wproc: Registry request: name=Core Worker 27783;pid=27783
[1431102009] wproc: Registry request: name=Core Worker 27784;pid=27784
[1431102009] Successfully launched command file worker with pid 27787
[1431102022] Caught SIGTERM, shutting down...
[1431102022] Successfully shutdown... (PID=27779)
[1431102022] Event broker module 'NERD' deinitialized successfully.

Running with -v is clean:

# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.0.8
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-12-2014
License: GPL

Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 816 services.
        Checked 826 hosts.
        Checked 11 host groups.
        Checked 0 service groups.
        Checked 18 contacts.
        Checked 13 contact groups.
        Checked 61 commands.
        Checked 6 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 826 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 6 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Also, the check_nagios says we're running OK:

# /usr/local/nagios/libexec/check_nagios /var/log/nagios.log 5 '/usr/local/nagios/bin/nagios'
NAGIOS OK: 8 processes, status log updated 11 seconds ago

One possibility is the error means it can't access the nagios.cfg file. I've checked that the path to that is r-x for 'other' (to cover apache user) on all directories on the path. In any case, if there was a permission issue, that should make an apache error. I've been working on this for a couple of hours and can't find the point of failure, or what changed.

The main page also shows "Unable to get process status" under the Nagios Core logo. That is from running statusjson.cgi in main.php – not sure what it is looking at, but the page is blank when I run the CGI query (cgi-bin/statusjson.cgi?query=programstatus) from main.php manually. I've googled this, searched the nagios forums, but everyone else seems to have some log error(s) to give more clues.

I do have one anomaly…

I found another nagios.log which is being touched with just a couple of lines each time the service is started:

# cat /usr/local/nagios/var/nagios.log
[1431088940] Error: Cannot open main configuration file '/' for reading!
[1431088940] Error: Failed to process config file '/'. Aborting

Perhaps something wacky with the init or cfg files, but I can't find it.
As another test, I can su to nagios and run the daemon manually.

su - nagios
[nagios@atlas ~]$ /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.0.8
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-12-2014
License: GPL

Website: http://www.nagios.org
Nagios 4.0.8 starting... (PID=23234)
Local time is Fri May 08 13:45:12 ADT 2015
nerd: Channel hostchecks registered successfully
nerd: Channel servicechecks registered successfully
nerd: Channel opathchecks registered successfully
nerd: Fully initialized and ready to rock!
wproc: Successfully registered manager as @wproc with query handler
wproc: Registry request: name=Core Worker 23235;pid=23235
wproc: Registry request: name=Core Worker 23236;pid=23236
wproc: Registry request: name=Core Worker 23237;pid=23237
wproc: Registry request: name=Core Worker 23238;pid=23238
wproc: Registry request: name=Core Worker 23239;pid=23239
wproc: Registry request: name=Core Worker 23240;pid=23240
Successfully launched command file worker with pid 23241

I hoped this would avoid anything odd in the init script. It does not touch the /usr/local/nagios/var/nagios.log (expected), but it does not change the error from the web site cgis. Another clue is that when the nagios is started manually like this, I don't see any logging on the screen of hosts and status items. If I launch the init, there are some warnings on some host performance, flapping and the usual chatter from nagios logs, but it is not saying anymore than the above when started from command line as nagios user.

Best Answer

This question eventually did go to the nagios core support forum and it was resolved there.

http://support.nagios.com/forum/viewtopic.php?f=7&t=32795

In this particular case, we were missing the config entries for

state_retention status_file

but there are many different types of errors which can also lead to the web interface error beginning with "Whoops!".