Notification Control : Maximum Alerts to Send

andy18 · August 26, 2008, 7:16am

Hi all,

I have Nagios installed and everything runs pretty well. As I have more than 1000 services to be monitored across the 200 servers I have. I was thinking for the options to control the maximum number of email notification to send out. Say, maximum of 3 emails will only be sent out when the services turns into Critical. In the mean time, Nagios will continue to check on the services availability and only send out email when it recovers.

Strides · August 28, 2008, 10:10am

I would have thought it would be possible to do something with event handlers, external commands and standard macros to auto-acknowledge (thus suppressing further notifications until recovery) when a host/service check is down/critical and the appropriate notification count =>3. Take a look at…
nagios.sourceforge.net/docs/3_0/ … dlers.html
nagios.sourceforge.net/docs/3_0/extcommands.html

and use something like…
[blockquote]ACKNOWLEDGE_HOST_PROBLEM
Command Format:
ACKNOWLEDGE_HOST_PROBLEM;<host_name>;;;;;
Description:
Allows you to acknowledge the current problem for the specified host. By acknowledging the current problem, future notifications (for the same host state) are disabled. If the “sticky” option is set to one (1), the acknowledgement will remain until the host returns to an UP state. Otherwise the acknowledgement will automatically be removed when the host changes state. If the “notify” option is set to one (1), a notification will be sent out to contacts indicating that the current host problem has been acknowledged. If the “persistent” option is set to one (1), the comment associated with the acknowledgement will survive across restarts of the Nagios process. If not, the comment will be deleted the next time Nagios restarts. [/blockquote] nagios.org/developerinfo/ext … mand_id=39

[blockquote]ACKNOWLEDGE_SVC_PROBLEM
Command Format:
ACKNOWLEDGE_SVC_PROBLEM;<host_name>;<service_description>;;;;;
Description:
Allows you to acknowledge the current problem for the specified service. By acknowledging the current problem, future notifications (for the same servicestate) are disabled. If the “sticky” option is set to one (1), the acknowledgement will remain until the service returns to an OK state. Otherwise the acknowledgement will automatically be removed when the service changes state. If the “notify” option is set to one (1), a notification will be sent out to contacts indicating that the current service problem has been acknowledged. If the “persistent” option is set to one (1), the comment associated with the acknowledgement will survive across restarts of the Nagios process. If not, the comment will be deleted the next time Nagios restarts. [/blockquote] nagios.org/developerinfo/ext … mand_id=40

[blockquote]$HOSTSTATE$ A string indicating the current state of the host (“UP”, “DOWN”, or “UNREACHABLE”). [/blockquote] nagios.sourceforge.net/docs/3_0/ … #hoststate

[blockquote]$SERVICESTATE$ A string indicating the current state of the service (“OK”, “WARNING”, “UNKNOWN”, or “CRITICAL”). [/blockquote] nagios.sourceforge.net/docs/3_0/ … rvicestate

[blockquote]$HOSTNOTIFICATIONNUMBER$ The current notification number for the host. The notification number increases by one (1) each time a new notification is sent out for the host (except for acknowledgements). The notification number is reset to 0 when the host recovers (after the recovery notification has gone out). Acknowledgements do not cause the notification number to increase, nor do notifications dealing with flap detection or scheduled downtime.
…
$SERVICENOTIFICATIONNUMBER$ The current notification number for the service. The notification number increases by one (1) each time a new notification is sent out for the service (except for acknowledgements). The notification number is reset to 0 when the service recovers (after the recovery notification has gone out). Acknowledgements do not cause the notification number to increase, nor do notifications dealing with flap detection or scheduled downtime. [/blockquote] nagios.sourceforge.net/docs/3_0/ … tionnumber

HTH

/S

andy18 · September 11, 2008, 3:08am

Hey Strides,

Thanks for pointing the tips but I am stuck on getting this to work. Here’s what I have did:

Create the Event Handler at /usr/local/nagios/libexec/nagios-ack with the Acknowledge service problems at nagios.org/developerinfo/ext … mand_id=40
Define the Event Handler for the services in the templates.cfg ( I have a service template to share for all the host )

I have made a restart and can see the following in the messages log but I still continue to receive the notification even when the service is down.

Sep 11 11:02:56 ipmonitor nagios: SERVICE EVENT HANDLER: host1;SMTP;(null);(null);(null);nagios-ack

Following is the service definition file in templates.cfg

[blockquote]
define service{
name HostServices1
notifications_enabled 1
is_volatile 1
max_check_attempts 5
normal_check_interval 1
retry_check_interval 1
active_checks_enabled 1
passive_checks_enabled 1
check_period 24x7
parallelize_check 1
obsess_over_service 1
check_freshness 1
freshness_threshold 60
low_flap_threshold 10
high_flap_threshold 30
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
contact_groups Monitoring_group
notification_interval 0
notification_period 24x7
notification_options w,u,c,r,f
stalking_options o,w,u,c
event_handler_enabled 1
event_handler nagios-ack
}
[/blockquote]

The nagios-ack file:
[blockquote]
#!/bin/sh

This is a sample shell script showing how you can submit the ACKNOWLEDGE_HOST_PROBLEM command

to Nagios. Adjust variables to fit your environment as necessary.

now=date +%s
commandfile=‘/usr/local/nagios/var/rw/nagios.cmd’

/usr/bin/printf “%lu] ACKNOWLEDGE_SVC_PROBLEM;$HOSTADDRESS$;$SERVICEDESC$;1;1;1;Andy;Acknowledged\n” $now > $commandfile
[/blockquote]

Strides · September 11, 2008, 10:47am

Hi
Don’t worry about “Sep 11 11:02:56 ipmonitor nagios: SERVICE EVENT HANDLER: host1;SMTP;(null);(null);(null);nagios-ack”… the (null) is a red herring… something about the values not being populated at the point the log file is written if I recall.
I think the issue might be with one of the macros you have used in the command script…
[blockquote]/usr/bin/printf “%lu] ACKNOWLEDGE_SVC_PROBLEM;$HOSTADDRESS$;$SERVICEDESC$;1;1;1;Andy;Acknowledged\n” $now > $commandfile[/blockquote]
I believe it should be $HOSTNAME$ not $HOSTADDRESS$

[blockquote]Command Format:
ACKNOWLEDGE_SVC_PROBLEM;<host_name>;<service_description>;;;;;[/blockquote]

HTH

/S