Escalations and notifications of service restored

dmusser2005 · October 20, 2005, 3:53am

I’m on the beta version, and I have it setup with the following.

First Escalation - Admin: Notified every hour for the first 2.

Second Escalation - Admin & Manager: Notified on hour 3 (only)

Third Escalation - Admin: Notified every hour from 4 on.

This works fine. The problem is if I am in hour 6 and the service returns to normal it never goes back to the second group to notify any members who were notified of the outage that the service has recovered. It does notify the Admin group in the third escalation that it has recovered, but the managers are never notified.

Is there a way to have Nagios keep track of the users for a particular state change that are notified, so if the state returns to normal they are again notified of this?

Clipper · October 20, 2005, 7:47am

Hi,

normally Nagios is always notifying the people that were alerted about either acknowledgment or recoveries. Whatever the escalation path is, if someone was alerted at some point it should always get a notification when the service is back.

Maybe you can paste your contacts and escalation configs here so we can have a look at it ?

Cheers
Clipper

luca · October 20, 2005, 10:28am

possibly missing recovery from notifications options for manager?
what if the service goes back up just after the third notification? (admin & manager)

Luca

dmusser2005 · October 20, 2005, 1:43pm

I assumed that is how it would work as well, and am sure it must be in my configurations

Below please see my configurations:

Contacts

# 'nagios' contact definition define contact{ contact_name dmusser alias David Musser service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,u,r service_notification_commands notify-by-email host_notification_commands host-notify-by-email email [[email protected]](mailto:[email protected]) }

‘nagios’ contact definition

define contact{
contact_name jdoe
alias john doe
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email [email protected]
}

‘nagios’ contact definition

define contact{
contact_name ssmith
alias Sam Smith
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email [email protected]
}

Escalations.cfg

define serviceescalation{ host_name Server1 service_description FTP first_notification 2 last_notification 2 notification_interval 60 contact_groups firstline,managers }

define serviceescalation{
host_name Server1
service_description FTP
first_notification 3
last_notification 0
notification_interval 60
contact_groups firstline
}

contactgroup.cfg

define contactgroup{ contactgroup_name firstline alias First Line Administrators Group members dmusser,jdoe }

define contactgroup{
contactgroup_name managers
alias Managers Group
members ssmith
}

Service

# Service definition define service{ use generic-service ; Name of service template to use

    host_name                       Server1
    service_description             FTP
    is_volatile                     0
    check_period                    24x7
    max_check_attempts              1
    normal_check_interval           15
    retry_check_interval            2
    contact_groups                  firstline
    notification_interval           60
    notification_period             24x7
    notification_options            w,u,c,r
    check_command                   check_ftp
    }

[quote=“Clipper”]Hi,

normally Nagios is always notifying the people that were alerted about either acknowledgment or recoveries. Whatever the escalation path is, if someone was alerted at some point it should always get a notification when the service is back.

Maybe you can paste your contacts and escalation configs here so we can have a look at it ?

Cheers
Clipper[/quote]

dmusser2005 · October 20, 2005, 1:58pm

No they all have w,u,c,r for the service. I believe, and am going to test this today, but if I catch it before it goes to the 3rd level that the notification of recovery will be sent to the mrgs. I’m testing it now.

[quote=“luca”]possibly missing recovery from notifications options for manager?
what if the service goes back up just after the third notification? (admin & manager)

Luca[/quote]

dmusser2005 · October 20, 2005, 2:15pm

I just tested it, and if I catch it before the 3rd level than it does notify the managers group. But if it gets to the 3rd level then it just notifys the firstline group.

Clipper · October 20, 2005, 3:40pm

Well, now I seem to remember that the escalation path must ALWAYS include the previous escalation for this to work. I think Nagios sends recoveries only to the people that have received the LAST round of notifications (thus not the managers at that point).

That’s by design I’m afraid

Clipper

dmusser2005 · October 20, 2005, 6:06pm

I did see this in the docs —"When defining notification escalations, it is important to keep in mind that any contact groups that were members of “lower” escalations (i.e. those with lower notification number ranges) should also be included in “higher” escalation definitions. This should be done to ensure that anyone who gets notified of a problem continues to get notified as the problem is escalated. "

What I did not understand was that if you remove a group from that they will not be notified when it is fixed if they were already notified of the error. I will have to see if there is any way I can write some extra logic into my notification scripts.

Thanks
David

Clipper · October 21, 2005, 12:09pm

I think you can use a variable called $NOTIFICATIONNUMBER$ in your notification script. When it is an alert notification, track down this number in a file with the names of people receiving it. When it is a recovery notification, grab back this information to send to all people in question. Only thing, I do not know if the recovery number is actually matching the alert number, or if it’s considered as a new notification with a new number. Otherwise you will have to keep the information instead (which host, which service, which state, etc)

HTH
Clipper

jakkedup · October 22, 2005, 3:21pm

[quote=“dmusser2005”]I’m on the beta version, and I have it setup with the following.

First Escalation - Admin: Notified every hour for the first 2.

Second Escalation - Admin & Manager: Notified on hour 3 (only)

Third Escalation - Admin: Notified every hour from 4 on.

This works fine. The problem is if I am in hour 6 and the service returns to normal it never goes back to the second group to notify any members who were notified of the outage that the service has recovered. It does notify the Admin group in the third escalation that it has recovered, but the managers are never notified.

Is there a way to have Nagios keep track of the users for a particular state change that are notified, so if the state returns to normal they are again notified of this?[/quote]

Think about what you are doing here. First, you alert Admins and they don’t acknowledge or fix the problem. Then you contact admins and managers and they don’t ack or fix the problem. Obviously, this must be some very important machines or you wouldn’t bother the managers. So why would you never attempt to contact the managers on level 3? Why even have a level 3? It’s obvious that nobody is fixing the problem, you have production to run, and somebody needs to fix it now. So if you are going to define a level 3, then include the managers, admins, and the president of your company if you must, but it needs to be fixed NOW. Otherwise, just drop the level 3 and just use 1 nd 2 like you have it. It makes no sense to ignore the managers since they are going to whip someone’s tail for not fixing this problem sooner.

By simply using 1 and 2 as you have it, Nagios is doing what it is supposed to be doing. i.e. escalating the notifications. The shear definition of the word itself would mean that you don’t go backwards ever.