Hi I’ve got a Nagios instance with about 448 services on it and 150 services are configured as follows
check_period 7:30 - 20:00 (timeperiod config excluded) normal_check_interval 15 (being 15 minutes)
When Nagios starts it correctly schedules the services in 15 minute round robin fashion.
It stops checking the services at 8pm and reschedules them all to recheck at 7:30am.
After they are ALL checked at 7:30am Nagios changes the next_check_time to be 15 minutes later exactly for the 150 services.
Even with the auto-rescheduling configuration set as follows in the nagios.cfg file
auto_reschedule_checks=1 auto_rescheduling_window=900 auto_rescheduling_interval=30
This means all the checks are executed are executed at the same time, which cause the latency to go skyward, though the system does handle it !lol .
Though it keeps doing this every 15minutes of the day.
I’ve spent the last week or so on and off trying to resolve this issue and done the following
[list]Implemented MRTG Nagios performance graphs[/list:u]
[list]Downloaded and installed the latest release candidate Nagios 3.0rc3[/list:u]
[list]Played around with the configuration settings no end[/list:u]
[list]Searched through the doco/forums trying to find an answer[/list:u]
So for the moment I’ve changed the check_period to be 24x7 to stop the issue, another work around is to stop Nagios and restart it forcing a reschedule of the clumped checks. Has anyone got a similar config with similar problems?
My Nagios configuration can be found on the following link