Howdy –
I am using an existing installation of nagios, and I am for the most part, a nagios newbie, but very familiar with SNMP monitoring practices in general.
One problem I’m trying to tackle is that I’m getting my service outages when a host goes down instead of a host outage. When I look at the web GUI, I see the following for all of my host configs:
I’ve reformatted and taken a few things out to make it more readable. I’ve also highlighted the things that look suspect to me.
Host Name: atest01
Max. Check Attempts: 3
**Check Interval: 0h 4m 0s **
Host Check Command: check-host-alive
Obsess Over: Yes
**Enable Active Checks: No
Enable Passive Checks: Yes
Check Freshness: No **
Freshness Threshold: Auto-determined value
Default Contact Groups: corpemail
Notification Interval: 48h 0m 0s
Notification Options: Down, Unreachable, Recovery
Notification Period: 24x7
Per the Docs, it seems that they do not recommend setting a check interval for hosts, because “among other reasons”, its bad for performance. Could this be “another reason”?
“Enable active checks” Being disabled also seems like a bad thing to me, since the docs describe “on demand” checks as falling under the active checks category.
I’m referencing the host options section of this page: nagios.sourceforge.net/docs/2_0/xodtemplate.html
Here is my generic host config:
define host{
name generic-host ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
check_command check-host-alive
contact_groups corpemail
check_interval 4
max_check_attempts 3
notification_interval 2880 ; 0=noRepeats, If set to Zero host escalation will not work
notification_period 24x7
notification_options d,u,r
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
Here is my Generic Service config
define service{
name generic-service ; The 'name' of this service template, referenced in other service definitions
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enable
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 4
retry_check_interval 2
contact_groups rt-warn-q
notification_interval 2880 ; 0=norepeats, no-repeats disables escalations
notification_period 24x7
notification_options w,u,c,r
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
If anyone has any insight I’d really appreciate it. Unfortunately I don’t really have a good “test” box to test on, so I’d like some confirmation (or not) as to whether my suspicions are truly suspicious