Nagios Stalling


#1

I have an interesting issue. I have nagios configured for distributed monitoring however not all the distributed hosts are up and running yet. In this case nagios processes checks up to a point then it stops updating/re-scheduling/checking. I have it set to check for freshness and orphans but my guess is that since these services are in an ambiguous state they aren’t covered by those settings.

The problem is that Nagios effectively stops checking services and hosts and the command pipe gets quite large after a while (from the few which are checking), causing the machine to come to it’s knees which is understandable.

My question is this: Has anybody else seen this and is there a way to get around it until all the servers are up without having to have one machine actively check everything and report to the central server?


#2

When a service check fails, nagios will stop doing anything else, until it’s “host check” has completed.
So, what is your host check for the services that are failing?
If you have not completed your setup, you should disable accepting passive checks and disable host checks for those devices.