Warning - running nagios did not exit in time


#1

Hello,

Server where nagios is located:

Red Hat Linux release 7.2 (Enigma)
Apache .2.2.3
gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-98 )

Server that I want to monitor (nrpe installed):

Fedora Core release 2 (Tettnang) Apache .2.2.3 version gcc 3.3.3 20040412 (Red Hat Linux 3.3.3-7)

I have been using nagios v1.x for some time now on a different server (and network).
I am doing a new installation of the 2.5 release.
I am starting fresh, I am not trying to import anything.

I installed an RPM version by rebuilding the src rpm.
When I start nagios (/etc/init.d/nagios start) I get no errors:

In the nagios.log I see no errors:

[1158259211] Nagios 2.5 starting... (PID=23923) [1158259211] LOG VERSION: 2.0 [1158259211] Finished daemonizing... (New PID=25591) The file /var/log/messages has similar lines.

I used the minimal.cfg file, renamed it and simply changed the definitions
for the services, hosts, contact, etc… I also changed the chekc_commands to use the check_nrpe like this :

define command{
	command_name    check_nrpe
	command_line    /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
      	}

define service {
       ...
        check_command                   check_nrpe!check_disk1
}

I have the nrpe deaemon running on the remote host with the IP of
where nagios is in the allowed_host config (running as xinet.d) and I have
verified it is running by telnet localhost 5666 and getting a connection
from both servers (nagios server and the server I want to monitor).

At first, I had check external commands disabled. I could use the web
interface but all services were PENDING.
And when I was trying to stop nagios (/etc/init.d/nagios stop) I would get
an error (the only error I saw in all this mess):

Stopping network monitor: nagios
Waiting for nagios to exit . . . . . . . . . . .
Warning - running nagios did not exit in time

Some left over nagios process is left running and I have to “killall -9
nagios” to stop nagios otherwise I end up having multiple copies running.

I’ve enabled check extern commands and changed the “rw” directory’s
permission based on the documentation
(nagios.sourceforge.net/docs/2_0/commandfile.html) and now the
web says nagios is not running even though I restarted apache and
nagios.
Error I get is : "Whoops! Error: Could not read host and service status information!"
However nagios IS running. (ps awux | grep nagios shows 3 processes)

apache is running as “daemon” so I have added secondary group
nagiocmd to nagios and to daemon.
the permission of the directory is:

When I run nagios -v /etc/nagios/nagios.cfg I get no warning and no
errors (Things look okay).
I have 6 services, one host, one host group, one contact, one contact
group, 9 commands and one timeperiod.

I’ve disabled the retention as suggested in this forum for services and
hosts left in “PENDING” but no use. I would very much like to force a
check using the CGI but I am having troble there too.

The log directory does not contains a status.log file, I do not know if
that’s relevant.
I tried starting the daemon by hand (nagios -d /etc/nagios/nagios.cfg) but
I got no error messages there either.

I’ve searched on google to the only error message (in the title of this
post) but only found reference to the script.
I went over the parts of the documentation that seem related to my
problem, I’ve looked thru the nagios FAQ and the forum.
So in summary, nagios cannot be stopped properly, the services are
never checked, maybe a log file is missing, there’s not really any
usefull error messages and I am lossing patience!

I think I have done everything I can possibly can but I must be missing
something that is too obvious and too simple for me to see, or
something related to some information I am not aware of.

Any pointers would be appreciated.

Thank you

Christian Roy


#2

I have removed the RPM installation of 2.5 and installed from the source code.
All is working great now.