I’m on the beta version, and I have it setup with the following.
First Escalation - Admin: Notified every hour for the first 2.
Second Escalation - Admin & Manager: Notified on hour 3 (only)
Third Escalation - Admin: Notified every hour from 4 on.
This works fine. The problem is if I am in hour 6 and the service returns to normal it never goes back to the second group to notify any members who were notified of the outage that the service has recovered. It does notify the Admin group in the third escalation that it has recovered, but the managers are never notified.
Is there a way to have Nagios keep track of the users for a particular state change that are notified, so if the state returns to normal they are again notified of this?
normally Nagios is always notifying the people that were alerted about either acknowledgment or recoveries. Whatever the escalation path is, if someone was alerted at some point it should always get a notification when the service is back.
Maybe you can paste your contacts and escalation configs here so we can have a look at it ?
possibly missing recovery from notifications options for manager?
what if the service goes back up just after the third notification? (admin & manager)
normally Nagios is always notifying the people that were alerted about either acknowledgment or recoveries. Whatever the escalation path is, if someone was alerted at some point it should always get a notification when the service is back.
Maybe you can paste your contacts and escalation configs here so we can have a look at it ?
No they all have w,u,c,r for the service. I believe, and am going to test this today, but if I catch it before it goes to the 3rd level that the notification of recovery will be sent to the mrgs. I’m testing it now.
[quote=“luca”]possibly missing recovery from notifications options for manager?
what if the service goes back up just after the third notification? (admin & manager)
I just tested it, and if I catch it before the 3rd level than it does notify the managers group. But if it gets to the 3rd level then it just notifys the firstline group.
Well, now I seem to remember that the escalation path must ALWAYS include the previous escalation for this to work. I think Nagios sends recoveries only to the people that have received the LAST round of notifications (thus not the managers at that point).
I did see this in the docs —"When defining notification escalations, it is important to keep in mind that any contact groups that were members of “lower” escalations (i.e. those with lower notification number ranges) should also be included in “higher” escalation definitions. This should be done to ensure that anyone who gets notified of a problem continues to get notified as the problem is escalated. "
What I did not understand was that if you remove a group from that they will not be notified when it is fixed if they were already notified of the error. I will have to see if there is any way I can write some extra logic into my notification scripts.
I think you can use a variable called $NOTIFICATIONNUMBER$ in your notification script. When it is an alert notification, track down this number in a file with the names of people receiving it. When it is a recovery notification, grab back this information to send to all people in question. Only thing, I do not know if the recovery number is actually matching the alert number, or if it’s considered as a new notification with a new number. Otherwise you will have to keep the information instead (which host, which service, which state, etc)
[quote=“dmusser2005”]I’m on the beta version, and I have it setup with the following.
First Escalation - Admin: Notified every hour for the first 2.
Second Escalation - Admin & Manager: Notified on hour 3 (only)
Third Escalation - Admin: Notified every hour from 4 on.
This works fine. The problem is if I am in hour 6 and the service returns to normal it never goes back to the second group to notify any members who were notified of the outage that the service has recovered. It does notify the Admin group in the third escalation that it has recovered, but the managers are never notified.
Is there a way to have Nagios keep track of the users for a particular state change that are notified, so if the state returns to normal they are again notified of this?[/quote]
Think about what you are doing here. First, you alert Admins and they don’t acknowledge or fix the problem. Then you contact admins and managers and they don’t ack or fix the problem. Obviously, this must be some very important machines or you wouldn’t bother the managers. So why would you never attempt to contact the managers on level 3? Why even have a level 3? It’s obvious that nobody is fixing the problem, you have production to run, and somebody needs to fix it now. So if you are going to define a level 3, then include the managers, admins, and the president of your company if you must, but it needs to be fixed NOW. Otherwise, just drop the level 3 and just use 1 nd 2 like you have it. It makes no sense to ignore the managers since they are going to whip someone’s tail for not fixing this problem sooner.
By simply using 1 and 2 as you have it, Nagios is doing what it is supposed to be doing. i.e. escalating the notifications. The shear definition of the word itself would mean that you don’t go backwards ever.