I have the following scenario:
One of my hosts is running (or rather not running three failed services. They are detected by Nagios (as “unknown”, because they are SNMP checks that are not answered) and all is ‘fine’ (ehrm … wahtever).
Now someone reboots the machine. The host goes down, which is also detected correctly by Nagios.
As the machine comes up again, the state change is not reported. Nagios sees it as down where it’s actually working (except for the things Nagios checks there).
If I add a ping check to the host, everything works as expected.
It looks like Nagios checks the services, finds out they’re no good and simply assumes the host is still down.
I’d think the logic should be the other way around - for a host that is recorded as “down”, there should be a periodical check to see whether it is up again, and then the services should be checked.
Of course, I may be doing something fundamentally wrong. Can anybody enlighten me here?
Thanks in advance!