Clean Tactical View, using Passive checks

groth · June 25, 2009, 6:04pm

I am using Nagios in a mixed local/remote monitoring situation.
Version 3.0.6
The remote Nagios, performs normally, obsessing over both hosts, and services, submitting the results to the main server. (nothing special in the config, normal checks are every 5 mins)
I wish to make use of the freshness checks, to notify if a service/machine has stopped responding.
If I configure the services or hosts with

I get ugly red lights on the tactical display.
For obvious reasons, the active checks are disabled (on the main Nagios only).
Like others, I don’t want those red warnings. I think they are a bit deceptive, because the services ARE being actively checked, just not locally.
I want green across the board, unless there is a problem.

I found for services, if I set

active_checks_enabled 1 normal_check_interval 0
on initial startup, the service is “marked” actively checked, but no check is scheduled. Using the documented settings, I have freshness checks working perfectly. (see below for my settings).

For Hosts, I don’t have as much luck. If I use similar settings on initial startup, the main server forces an active check immediately. Which runs the “Freshness failed” routine.
It will then schedule the next test, based on the retry. When my passive check arrives, the second check (although no longer visible in the queue) runs, again triggering a failure and scheduling another check. And it will vacillate between the two, until it reaches a HARD failure. At that point, no further Active checks are scheduled, and the Passive checks, work correctly.
The only way I’ve found to avoid this start up problem is to mark a host active checks disabled when first added. Then once Passive checks start arriving, using the web interface to re-enable active checks (to get rid of the red lights on the Tactical display).

I’m not interested in a modified tac.cgi, which doesn’t show the service checks disabled (I still want that to show up for the locally active services).

Is there a better way of doing this? Did I miss a config option somewhere?

Thanks,
GR

Config info:

nagios.cfg

check_service_freshness=1
service_freshness_check_interval=60
check_host_freshness=1
host_freshness_check_interval=60

templates.cfg

define host{
        name						remote_server

        notifications_enabled			1			; Host notifications are enabled
        event_handler_enabled		1			; Host event handler is enabled
        flap_detection_enabled		1			; Flap detection is enabled
        failure_prediction_enabled	1			; Failure prediction is enabled
        process_perf_data			1			; Process performance data
        retain_status_information		1			; Retain status information across program restarts
        retain_nonstatus_information	1			; Retain non-status information across program restarts
        passive_checks_enabled		1			; Passive host checks are enabled/accepted
        check_period				24x7			; By default, Windows servers are monitored round the clock
        retry_interval				1			; Schedule host check retries at 1 minute intervals
        max_check_attempts			3			; Check each server 10 times (max)
        notification_period			24x7			; Send notification out at any time - day or night
        notification_interval			120			; Resend notifications every 30 minutes
        notification_options			d,r 			; Only send notifications for specific host states
        contact_groups				admins		; Notifications get sent to the admins by default

        check_interval  				0			; We don't want to schedual a check as the results will be passive
        active_checks_enabled		0			; Active host checks are disabled to start
        check_freshness			1			; Freshness to make sure we receive messages
        freshness_threshold			899			; Check Freshness 14m 59s.  Will result in 15mins +/- 1 min on first, 15mins on second
        max_check_attempts			2			; Warning issued after 30mins with out an update
        check_command			remote_host_stale
        register					0
}


define service{
        name						remote-service
        active_checks_enabled		1			; Active service checks are enabled
        passive_checks_enabled		1			; Passive service checks are enabled/accepted
        parallelize_check			1			; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service			1			; We should obsess over this service (if necessary)
        notifications_enabled			1			; Service notifications are enabled
        event_handler_enabled		1			; Service event handler is enabled
        flap_detection_enabled		1			; Flap detection is enabled
        failure_prediction_enabled	1			; Failure prediction is enabled
        process_perf_data			1			; Process performance data
        retain_status_information		1			; Retain status information across program restarts
        retain_nonstatus_information	1			; Retain non-status information across program restarts
        is_volatile					0			; The service is not volatile
        check_period				24x7			; The service can be checked at any time of the day
        retry_check_interval			2			; Re-check the service every two minutes until a hard state can be determined
        contact_groups				admins		; Notifications get sent out to everyone in the 'admins' group
        notification_options			w,u,c,r		; Send notifications about warning, unknown, critical, and recovery events
        notification_interval			60			; Re-notify about service problems every hour
        notification_period			24x7			; Notifications can be sent out at any time

        normal_check_interval		0			; We don't want to schedual a check as the results will be passive
        check_freshness			1			; Freshness to make sure we receive messages
        freshness_threshold			899			; Check Freshness 14m 59s.  Will result in 15mins +/- 1 min on first, 15mins on second
        max_check_attempts			2			; Warning issued after 30mins with out an update
        check_command			remote_service_stale
        register					0
}