Avoiding multiple notifications


#1

Hi all,
I have nagios running for almost a year now with great satisfaction. Only problem is when a non critical machine goes down during the weekend we get a notification every 15-20 minutes depending on the setup. Would be nice to have only one notification when the machine goes down and one when (and if) it comes back online. In itself it would not be a problem using an escalation. Problem comes up when the machine is critical, in this case we have a second escalation which sends an email to a SMS gateway, if the machine comes back up we are missing some notifications.

Any idea for a better way of doing this?

Thanks in advance for any info.

Luca


#2

Change the notification interval to whatever you feel is appropriate for that service. From the .docs "If you set this value to 0, Nagios will not re-notify contacts about problems for this service - only one problem notification will be sent out."
Each servive check can have it’s own “notification_interval” setting, so set each to whatever you feel is good for you.


#3

this would solve multiple notifications, but i couldn’t use escalations anymore because they rely on the Nth notification sent… or am i missing something?

Thank you, Luca


#4

For non-critical machines, I’d set the “notification interval” to 0, that way, only one will be sent when it goes down, and one when it comes back OK. You may even want to create a timeperiod entry for this, so it won’t even send out notifications 24x7.

For critical devices, you would of course want to be notified every hour or so, so a setting of 60 might be good for notification_interval. You might even want to escalate the problem if the first group of people don’t “dissable notifications” or "acknowledge the problem. Read the docs on escalations and try to visualize it where you work.

Example: Joe Tech is notified of a problem once every hour.
He is supposed to “acknowledge” the problem and by doing so, everyone in the contact list will be notified that he acknowleged it and also, no further notifications will go out, until the service changes state.
You may want the procedure to be “Don’t disable notifications, and don’t acknowledge”, so that way, if the problem remains, then the notifications will go out on the 3rd interval due to your escalation settings. That way, the “boss” will know there is a critical device down, and it’s been 3 hours now. Joe Tech can in that case, put a comment on the failed service saying “I"m working on the problem, so please give me a few hours to purchase the parts” or whatever.

If you make your settings correct in the escalations.cfg, then you might have this:
Notifaction 1-3 go out to "General techs"
4-6 go out to System Analyst’s
7-10 go out to "Da Boss"
There are no further escalations past 10, so just General Techs will get notified from now on, every “notification_interval” or 60 minutes. Somebody has to acknowledge the problem to stop the notifications from going out, or fix the problem.


#5

In fact i didn’t think about putting only the non critical services to 0… we use 20 minutes for checks and i think we can live with multiple notifications on critical machines…

It’s the easy things which usually work best… :smiley:

Thank you very much :slight_smile:

Luca

Edited Fri Feb 11 2005, 03:15PM ]


#6

YOu mean I actually helped someone finally? Good God, what next?