Hi,
I’m looking for a little assistance with the configuration of some service dependencies in our nagios setup. We’ve been using nagios for almost a year now for our internal systems, however we’ve recently decided to start monitoring client systems. For our internal systems we’ve stuck with nrpe for remote checks however we’ve taken a different approach for our client systems (as a result of each client having different security requirements.) We initiate checks over reverse ssh tunnels.
This setup is working, however the reverse tunnels prove a little unstable at times. As a result if a reverse tunnel goes ‘down’ then all the checks for that host fail. I’ve set up a service check that is able to check whether a reverse tunnel (revt) is up or down. Ideally if a revt is down I only want to receive a notification for the the revt and not the other services.
I’m finding the documentation a little confusing on the topic:
define servicedependency{
host_name Host B
service_description Service D
dependent_host_name Host C
dependent_service_description Service F
execution_failure_criteria o
notification_failure_criteria w,u
}
Does this mean that Service D is dependent on service F or visa versa?
Also the service dependency only checks against the depending service’s most recent hard state. Therefore I have to make sure that revt is hard critical before the other services become hard critical (is that correct).
Finally if revt goes hard critical but before it does the other services become soft critical and active checks for the other services are disabled before it becomes hard critical, when the revt recovers it should only send a recovery notification for revt (is that right)?
Thanks a bunch for any possible clarification/help. I’ve been fiddling with this for a few weeks now and have been struggling to reduce the false criticals.
Here’s an example of my current configuration, if I’m not being clear about what I am trying to acheive:
define service{
use generic-service
host_name client
service_description revt
check_command check_revt
notification_interval 0
check_interval 5
max_check_attempts 2
}
define service{
use generic-service
host_name client
service_description load
check_command check_by_ssh!load
notification_interval 0
check_interval 30
}
define servicedependency{
host_name client
service_description revt
dependent_host_name client
dependent_service_description load
execution_failure_criteria c,u
notification_failure_criteria c,u
}