Nagios stops active checks

Hi,

I’m running nagios 3.2.0 with NDO 1.4b7. I mixed active and passive checks and it receives events from another nagios instance (50 hosts for 500 services). Since last week, I see nagios stops to perform active checks : the ‘last check’ is in the past and the ‘next scheduled check’ too. The only way to recover the situation and allow nagios to perform active checks is to restart nagios. But after hours, the problem occurs again.

Currently I put a cron job to restart every 4 hours nagios but it couldn’t be a final solution.

Some of you already faced this problem and fixed it ? I’m running mutiple nagios instances and I see this pb only with that one.

Thanks in advance for any inputs.

neofeet

are you using nagios restart? try stopping nagios, search for nagios processes left and kill them, start nagios.

See what happens and avoid using “restart”.

No, I do not use ‘restart’ method. After stopping nagios, no processes are running. I’ll have a look during next stop/start if there is any nagios processes still running.

neofeet

the “hung” processes are (ab)normaly generated by the restart procedure, there’s probably something timing out too fast leaving some orphaned nagios threads which cause problems, usually not a stopping system…

try having a look in the logs, what’s the last executed check, is the system date/time set correctly ?

First things which come to mind. :slight_smile: