Rebooting nagios' system all hosts are unreachable


#1

Hi,

I have a problem that I don’t know how to deal with :roll: . I’ve got the standard script in /etc/init.d (I’m using Solaris) so nagios wake up after rebooting the system. The problem is that even though the daemon is waking up, all hosts are unreachable but 3 out of 15, and the status iformation says “No output!”.

The hosts stays like that even if I schedule a new check and only if I reaload nagios, then it starts checking systems again, and all of them appear up and running.

How can I do so after rebooting nagios system it would check the systems again?. I use the command check_host_alive with check_ping.

Thanks a lot for your help.


#2

so nagios actually starts but somehow in the wrong way? what happens if you issue /etc/init.d/nagios restart ?
possibly using an old version of the nagios init script after an update?

Luca


#3

Well, restart works fine, the problem is restarting the hole system, I don’t know why.

About the possibility of having an old version… I’m afraid not, I only install a version (2.0b3) and there haven’t been any updates…


#4

Strage situation…
when the systems is rebooted is nagios actually working?
do the check times update in the interface?
Is /etc/init.d/nagios correctly linked in /etc/rc3.d ?

Luca


#5

yes, it updates the inteface (for example if I schedule a check) and about the link in rc3.d well, the process is running and I check the time of that process with the time of the reboot and it’s correct… it’s really weird…


#6

the thing is that every time the system is rebooted it keeps the same hosts up and unreachable (I mean that host A, B and C are always up and the others are unreachable…). Seams as if I configured something different between hosts… but I don’t get what…


#7

are you using retain_state_information, use_retained_program_state and use_retained_scheduling_info ?

Clipper


#8

Look through the nagios.log file. It might tell you something. Check your file permissions. Especially in var.

It almost sounds like when the system is booted, nagios is reading an old cache file (that hasn’t changed for many days due to bad permissions).

Nobody should be using the beta versions of nagios, unless you like to troubleshoot or have a particular dire need for it. If after you have become an expert with nagios v1.2, then yea, have fun and troubleshoot.

My suggestion is to move to the stable v1.2 and leave the debugging to the beta tester’s. We are production users, and need to have something working NOW, not tommorrow.


#9

Hi,

1st of all thanks to everybody for your help. I think I've fixed the problem: I was using the standard init script and the problem was that eventhough it performs a "su <usernagios>", before this command takes effect, it access a file where it needs to be <usernagios> so just moving the "su" command before it works

So once again thanks a lot