Service dependencies

Hi,

I’m looking for a little assistance with the configuration of some service dependencies in our nagios setup. We’ve been using nagios for almost a year now for our internal systems, however we’ve recently decided to start monitoring client systems. For our internal systems we’ve stuck with nrpe for remote checks however we’ve taken a different approach for our client systems (as a result of each client having different security requirements.) We initiate checks over reverse ssh tunnels.

This setup is working, however the reverse tunnels prove a little unstable at times. As a result if a reverse tunnel goes ‘down’ then all the checks for that host fail. I’ve set up a service check that is able to check whether a reverse tunnel (revt) is up or down. Ideally if a revt is down I only want to receive a notification for the the revt and not the other services.

I’m finding the documentation a little confusing on the topic:

define servicedependency{
	host_name			Host B
	service_description		Service D
	dependent_host_name		Host C
	dependent_service_description	Service F
	execution_failure_criteria	o
	notification_failure_criteria	w,u
	}

Does this mean that Service D is dependent on service F or visa versa?

Also the service dependency only checks against the depending service’s most recent hard state. Therefore I have to make sure that revt is hard critical before the other services become hard critical (is that correct).

Finally if revt goes hard critical but before it does the other services become soft critical and active checks for the other services are disabled before it becomes hard critical, when the revt recovers it should only send a recovery notification for revt (is that right)?

Thanks a bunch for any possible clarification/help. I’ve been fiddling with this for a few weeks now and have been struggling to reduce the false criticals.

Here’s an example of my current configuration, if I’m not being clear about what I am trying to acheive:

define service{
          use                 generic-service
          host_name           client
          service_description revt
          check_command       check_revt
          notification_interval 0
          check_interval      5
          max_check_attempts  2
          }

define service{
          use                 generic-service
          host_name           client
          service_description load
          check_command       check_by_ssh!load
          notification_interval 0
          check_interval      30
          }

define servicedependency{
        host_name                       client
        service_description             revt
        dependent_host_name             client
        dependent_service_description   load
        execution_failure_criteria   c,u
        notification_failure_criteria   c,u
        }

Okay so I’ve made some progress but am still getting a few of false alarms.

I’ve set the check interval on the revt check to 5 minutes and I’ve set up the other service checks with a retry_interval of 3 minutes. Revt has a max_check_attempts of 1 and the other services have the default of 3.

In theory it will take nagios 6 minutes to verify that a service is down from the first failed service check (ie. retry_interval x 2 more checks needed). Which is longer than the check interval for revt. As a result if by the time the service gets to it’s 3rd and final check it should have an up-to-date hard state for revt.

However it appears to still send me notifications that a bunch of services are down on a host with the last notification being that revt is down. ie. it should only have sent a notification to say that revt was down.

Dear G1_,

I’m afraid that Service Dependencies won’t work as expected. Take a look at:

viewtopic.php?f=59&t=5104&p=16828&hilit=dependencies#p16828

Hope it helps.