Notification after 2nd time service is down

Hi,

Is there a way to configure Nagios so that it only sends out an email notification the second time a service is down? If have it configured to try three times (max_check_attempts) when a service goes down, but this does not do what I want.

We have some unreliable links that go down occasionally for a few minutes but generally come back up 5 minutes later, so I don’t want to be notified unless it is down after two five minute check cycles.

Thanks in advance.

Hey there,

I had the same problem. Your best bet is to modify your max_check_attempts to 3 and your retry_interval to 2 in your service definition, it will not change state to CRITICAL until 6 minutes is up. Modify as necessary… 3x2=6 …raise the numbers if you want it to fail more times before it goes critical.

You could do this or set up a service escalation policy for those services (ie, first alert goes to a null address, second alert goes to you). Have a look at the nagios docs for how escalations work

Hey MP,

Thanks for that. I’ve been trying unsuccessfully to do just that. Would you be able to post some snippets of your configuration?

It’s as easy as including those directives in your service definitions for the flaky links:

define service{
use generic-service
host_name xx-flakey
service_description Check Flakey Website
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 2
contact_groups critical-admins
check_command check_http!www.flakey.com
}

normal_check_interval is 5 (normal), but if a service criticals, nagios will disregard the normal check_interval and check again after the retry_check_interval. It criticals again, and again it will check after the retry_check_interval. Once max_check_attempts has been reached (in this case, 3) then nagios will change the state of the service to HARD (critical) and send out an alert. Just increase the retry_check_interval or max_check_attempts to raise hte time it takes nagios to change the state of the service.

As long as max_check_attempts and retry_interval are set in your service definition, they will override any template settings.

Thanks MP!

Now it all makes sense.

Thanks for the help.