I recently got nagios set up and installed on a Server Running AIX 5.3. The install went (more or less) smoothly, and I got things configured and running. Things look good at first when I start / restart nagios.
I'm doing very simple checks, using check_ping as the "service" for each host, and check-host-alive to do the host checks.
The problem I'm having is that once I have a host go down, the scheduling gets "off" and several hosts don't get checked in time, and drop out of the queue. They never get rescheduled... an obvious problem.
I've tried changing the scheduling parameters around, and going through the docs & following what it says (including using the "smart" option for the inter-check delay), but I can't find a way around this.
Here's the output of the -s switch:
Copyright (c) 1999-2005 Ethan Galstad (www.nagios.org)
Last Modified: 04-03-2005
Projected scheduling information for host and service
checks is listed below. This information assumes that
you are going to start running Nagios with your current
HOST SCHEDULING INFORMATION
Total hosts: 246
Total scheduled hosts: 0
Host inter-check delay method: SMART
Average host check interval: 0.00 sec
Host inter-check delay: 0.00 sec
Max host check spread: 30 min
First scheduled check: N/A
Last scheduled check: N/A
SERVICE SCHEDULING INFORMATION
Total services: 246
Total scheduled services: 246
Service inter-check delay method: USER-SUPPLIED VALUE
Inter-check delay: 0.30 sec
Interleave factor method: SMART
Average services per host: 1.00
Service interleave factor: 1
Max service check spread: 30 min
First scheduled check: Mon May 23 16:42:41 2005
Last scheduled check: Mon May 23 16:43:54 2005
CHECK PROCESSING INFORMATION
Service check reaper interval: 10 sec
Max concurrent service checks: 100
I have no suggestions - things look okay.
Here's my services.cfg:
Here's my "standard" host config (used as an include):
Thanks for any help!