I have Nagios installed and everything runs pretty well. As I have more than 1000 services to be monitored across the 200 servers I have. I was thinking for the options to control the maximum number of email notification to send out. Say, maximum of 3 emails will only be sent out when the services turns into Critical. In the mean time, Nagios will continue to check on the services availability and only send out email when it recovers.
I would have thought it would be possible to do something with event handlers, external commands and standard macros to auto-acknowledge (thus suppressing further notifications until recovery) when a host/service check is down/critical and the appropriate notification count =>3. Take a look at… nagios.sourceforge.net/docs/3_0/ … dlers.html nagios.sourceforge.net/docs/3_0/extcommands.html
and use something like…
[blockquote]ACKNOWLEDGE_HOST_PROBLEM
Command Format:
ACKNOWLEDGE_HOST_PROBLEM;<host_name>;;;;;
Description:
Allows you to acknowledge the current problem for the specified host. By acknowledging the current problem, future notifications (for the same host state) are disabled. If the “sticky” option is set to one (1), the acknowledgement will remain until the host returns to an UP state. Otherwise the acknowledgement will automatically be removed when the host changes state. If the “notify” option is set to one (1), a notification will be sent out to contacts indicating that the current host problem has been acknowledged. If the “persistent” option is set to one (1), the comment associated with the acknowledgement will survive across restarts of the Nagios process. If not, the comment will be deleted the next time Nagios restarts. [/blockquote] nagios.org/developerinfo/ext … mand_id=39
[blockquote]ACKNOWLEDGE_SVC_PROBLEM
Command Format:
ACKNOWLEDGE_SVC_PROBLEM;<host_name>;<service_description>;;;;;
Description:
Allows you to acknowledge the current problem for the specified service. By acknowledging the current problem, future notifications (for the same servicestate) are disabled. If the “sticky” option is set to one (1), the acknowledgement will remain until the service returns to an OK state. Otherwise the acknowledgement will automatically be removed when the service changes state. If the “notify” option is set to one (1), a notification will be sent out to contacts indicating that the current service problem has been acknowledged. If the “persistent” option is set to one (1), the comment associated with the acknowledgement will survive across restarts of the Nagios process. If not, the comment will be deleted the next time Nagios restarts. [/blockquote] nagios.org/developerinfo/ext … mand_id=40
[blockquote]$SERVICESTATE$ A string indicating the current state of the service (“OK”, “WARNING”, “UNKNOWN”, or “CRITICAL”). [/blockquote] nagios.sourceforge.net/docs/3_0/ … rvicestate
[blockquote]$HOSTNOTIFICATIONNUMBER$ The current notification number for the host. The notification number increases by one (1) each time a new notification is sent out for the host (except for acknowledgements). The notification number is reset to 0 when the host recovers (after the recovery notification has gone out). Acknowledgements do not cause the notification number to increase, nor do notifications dealing with flap detection or scheduled downtime.
…
$SERVICENOTIFICATIONNUMBER$ The current notification number for the service. The notification number increases by one (1) each time a new notification is sent out for the service (except for acknowledgements). The notification number is reset to 0 when the service recovers (after the recovery notification has gone out). Acknowledgements do not cause the notification number to increase, nor do notifications dealing with flap detection or scheduled downtime. [/blockquote] nagios.sourceforge.net/docs/3_0/ … tionnumber
Hi
Don’t worry about “Sep 11 11:02:56 ipmonitor nagios: SERVICE EVENT HANDLER: host1;SMTP;(null);(null);(null);nagios-ack”… the (null) is a red herring… something about the values not being populated at the point the log file is written if I recall.
I think the issue might be with one of the macros you have used in the command script…
[blockquote]/usr/bin/printf “%lu] ACKNOWLEDGE_SVC_PROBLEM;$HOSTADDRESS$;$SERVICEDESC$;1;1;1;Andy;Acknowledged\n” $now > $commandfile[/blockquote]
I believe it should be $HOSTNAME$ not $HOSTADDRESS$