I am using Nagios in a mixed local/remote monitoring situation.
Version 3.0.6
The remote Nagios, performs normally, obsessing over both hosts, and services, submitting the results to the main server. (nothing special in the config, normal checks are every 5 mins)
I wish to make use of the freshness checks, to notify if a service/machine has stopped responding.
If I configure the services or hosts with
I get ugly red lights on the tactical display.
For obvious reasons, the active checks are disabled (on the main Nagios only).
Like others, I don’t want those red warnings. I think they are a bit deceptive, because the services ARE being actively checked, just not locally.
I want green across the board, unless there is a problem.
I found for services, if I set
active_checks_enabled 1
normal_check_interval 0
on initial startup, the service is “marked” actively checked, but no check is scheduled. Using the documented settings, I have freshness checks working perfectly. (see below for my settings).
For Hosts, I don’t have as much luck. If I use similar settings on initial startup, the main server forces an active check immediately. Which runs the “Freshness failed” routine.
It will then schedule the next test, based on the retry. When my passive check arrives, the second check (although no longer visible in the queue) runs, again triggering a failure and scheduling another check. And it will vacillate between the two, until it reaches a HARD failure. At that point, no further Active checks are scheduled, and the Passive checks, work correctly.
The only way I’ve found to avoid this start up problem is to mark a host active checks disabled when first added. Then once Passive checks start arriving, using the web interface to re-enable active checks (to get rid of the red lights on the Tactical display).
I’m not interested in a modified tac.cgi, which doesn’t show the service checks disabled (I still want that to show up for the locally active services).
Is there a better way of doing this? Did I miss a config option somewhere?
Thanks,
GR
Config info:
nagios.cfg
check_service_freshness=1
service_freshness_check_interval=60
check_host_freshness=1
host_freshness_check_interval=60
templates.cfg
define host{
name remote_server
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
passive_checks_enabled 1 ; Passive host checks are enabled/accepted
check_period 24x7 ; By default, Windows servers are monitored round the clock
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 3 ; Check each server 10 times (max)
notification_period 24x7 ; Send notification out at any time - day or night
notification_interval 120 ; Resend notifications every 30 minutes
notification_options d,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
check_interval 0 ; We don't want to schedual a check as the results will be passive
active_checks_enabled 0 ; Active host checks are disabled to start
check_freshness 1 ; Freshness to make sure we receive messages
freshness_threshold 899 ; Check Freshness 14m 59s. Will result in 15mins +/- 1 min on first, 15mins on second
max_check_attempts 2 ; Warning issued after 30mins with out an update
check_command remote_host_stale
register 0
}
define service{
name remote-service
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0 ; The service is not volatile
check_period 24x7 ; The service can be checked at any time of the day
retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be determined
contact_groups admins ; Notifications get sent out to everyone in the 'admins' group
notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 60 ; Re-notify about service problems every hour
notification_period 24x7 ; Notifications can be sent out at any time
normal_check_interval 0 ; We don't want to schedual a check as the results will be passive
check_freshness 1 ; Freshness to make sure we receive messages
freshness_threshold 899 ; Check Freshness 14m 59s. Will result in 15mins +/- 1 min on first, 15mins on second
max_check_attempts 2 ; Warning issued after 30mins with out an update
check_command remote_service_stale
register 0
}