[quote=“dmusser2005”]I’m on the beta version, and I have it setup with the following.
First Escalation - Admin: Notified every hour for the first 2.
Second Escalation - Admin & Manager: Notified on hour 3 (only)
Third Escalation - Admin: Notified every hour from 4 on.
This works fine. The problem is if I am in hour 6 and the service returns to normal it never goes back to the second group to notify any members who were notified of the outage that the service has recovered. It does notify the Admin group in the third escalation that it has recovered, but the managers are never notified.
Is there a way to have Nagios keep track of the users for a particular state change that are notified, so if the state returns to normal they are again notified of this?[/quote]
Think about what you are doing here. First, you alert Admins and they don’t acknowledge or fix the problem. Then you contact admins and managers and they don’t ack or fix the problem. Obviously, this must be some very important machines or you wouldn’t bother the managers. So why would you never attempt to contact the managers on level 3? Why even have a level 3? It’s obvious that nobody is fixing the problem, you have production to run, and somebody needs to fix it now. So if you are going to define a level 3, then include the managers, admins, and the president of your company if you must, but it needs to be fixed NOW. Otherwise, just drop the level 3 and just use 1 nd 2 like you have it. It makes no sense to ignore the managers since they are going to whip someone’s tail for not fixing this problem sooner.
By simply using 1 and 2 as you have it, Nagios is doing what it is supposed to be doing. i.e. escalating the notifications. The shear definition of the word itself would mean that you don’t go backwards ever.