Problems when listing multple hosts within a single service


#1

Hello everyone,

My latest nagios related woe is in relation to service escalations.

I have recently found that where I have listed multiple hosts within a single service escalation def, the escalations do not appear to function as they should.

Before I detail the issue, here is a copy of the escalations I have in place for each of my service defs:

define serviceescalation {
host_name ,,…
service_description <srv_desc>
first_notification 1
last_notification 0
contact_groups admins
notification_interval 20
escalation_period workhours
}

define serviceescalation {
host_name ,,…
service_description <srv_desc>
first_notification 1
last_notification 0
contact_groups support_primary
notification_interval 20
}

define serviceescalation {
host_name ,,…
service_description <srv_desc>
first_notification 2
last_notification 0
contact_groups support_shadow
notification_interval 20
}

For each service escalation definitions that list multple hosts, the alert notifications recevied by the support_shadow contact group are very erratic. Sometimes this group receives the 1st notification, other times the 2nd (as intended), sometimes they receive the recovery notification having not received a critical notification - it appears to be quite random.

Admittedly, I probably don’t require the 2nd escalation definition that is listed above, yet I don’t believe this configuration would cause any problems to occur either, especially since I do not experience this issue for any of the numerous escalation defs that have just a single host defined.

I am aware that I can use the host_groups parameter to avoid listing multiple hosts, but for various reasons, this is not something I can deploy right now, besdies that would only be a workaround and not a fix.

Can anybody confirm whether they have a configuration similar to my own which works? Or do any of you have ay pointers? I’ve been through the documentation and can’t find any information which helps.

Thanks!


#2

UPDATE: I can now confirm that the issue in question has nothing to do with having defined multiple hosts within a single service escalation def. I seem to have a problem with the service tests responsible for measuring CPU on my Windows hosts. The only problem is that I can’t find anything unique about these checks when compared to the hundreds of other checks which all use the same ‘generic-service’ def. Here are some relevant defs:

[blockquote]
define service {
name generic-service ; The ‘name’ of this service template, referenced in other service definitions
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness’
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 0 ; Flap detection is enabled
process_perf_data 0 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 2 ; 2 minutes
retry_check_interval 1 ; 1 minute
contact_groups support_primary
notification_interval 20 ; 20 minutes
notification_period 24x7
notification_options w,u,c,r
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}

define service {
service_description CPU
use generic-service
host_name
check_command check_win_cpuload!30!50
}

ESCALATIONS ARE AS DEFINED IN PREVIOUS POST

define contact {
contact_name
alias
service_notification_period 24x7
host_notification_period 24x7
service_notification_options c,r
host_notification_options d,r
service_notification_commands notify-by-sms
host_notification_commands host-notify-by-sms
email
pager
}

define contactgroup {
contactgroup_name support_primary
alias Primary 24 Hour Support Engineer
members
}

define contactgroup {
contactgroup_name admins
alias Systems Admins
members
}

define contactgroup {
contactgroup_name support_shadow
alias Shadow 24 Hour Support Backup
members
}

[/blockquote]

So the problem is that the ‘support_shadow’ group members are receiving the 1st notification outside of ‘workhours’ where they should only be receiving the 2nd, followed by all consequent notifications. This only appears to appy to the service test described above.