Hello there,
I’m experiencing the following problem with an event handler:
I defined an eventhandler to execute something and send the results by email.
When i run this script with the nagios user by hand, it works great, however when it’s fired by the corresponding event it seems not to be executed.
This is the corresponding config:
the event handler notificar_procs
#!/bin/sh
# Event handler script: notificar procesos
if $# -ne 7 ]; then
echo "USAGE: $0 <SERVICESTATE> <SERVICETYPE> <NUMBEROFTRY> <SERVICEDESC> <HOSTNAME> <CONTACTOS> <HOSTDESC>" >&2
exit
fi
case "$1" in
OK)
;;
WARNING)
;;
UNKNOWN)
;;
CRITICAL)
case "$2" in
SOFT)
;;
HARD)
# Create at file /home/nagiosclients/
/usr/local/nagios/libexec/check_nrpe -H $5 -c check_procs
sleep 1
# E-mail file
/usr/bin/printf "%b" "***** Nagios AGF Allianz *****\n\nNotification Type: Event Handler\n\nService: $4 \nHost: $5\nDescripcion:$7\nState: $1 " | /usr/bin/mutt -x -s "** Lista de Procesos en $5 **" $6 -a /home/nagiosclients/$(echo $5 |tr a-z A-Z)-$4.txt
;;
esac
;;
esac
exit 0
The definition of the command:
define command{
command_name notificar_procs
command_line $USER1$/notificar_procs $SERVICESTATE$ $SERVICETYPE$ $SERVICEATTEMPT$ $SERVICEDESC$ $HOSTNAME$ $CONTACTEMAIL$ $HOSTALIAS$
}
The service for whom it runs for:
define service{
use generic-service
host_name srvnt04
service_description CPU
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 6
retry_check_interval 1
contact_groups grupo_windows
notification_interval 720
notification_period 24x7
notification_options c,r
event_handler notificar_procs
check_command check_nrpe!check_cpu
}define service{
use generic-service
host_name srvnt04
service_description CPU
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 6
retry_check_interval 1
contact_groups grupo_windows
notification_interval 720
notification_period 24x7
notification_options c,r
event_handler notificar_procs
check_command check_nrpe!check_cpu
}
Any help is welcomed :), thanks.
event_handler_enabled ??? I think you might need to do this too, for the service and also make sure it’s eanbled in nagios.cfg i.e. enable_event_handlers=1
define service{
use generic-service
host_name srvnt04
service_description CPU
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 6
retry_check_interval 1
contact_groups grupo_windows
notification_interval 720
notification_period 24x7
notification_options c,r
event_handler notificar_procs
check_command check_nrpe!check_cpu
}
When using templates, if the service does not differ from the template, then all you need is the “use” line. Add any definitions that are different. In other words, don’t duplicate settings found in the template, for the service definition also.
define service{
use generic-service
host_name srvnt04
service_description CPU
event_handler_enabled 1
}
The event_handler_enabled is activated in the template, and so it is in nagios.cfg : i forgot to say that.
In Nagios Logs i can see, the event handler is called when the service changes its state. But i don’t see the script executing.
Thanks a lot jakkedup.
Bye.
I don’t see
event_handler_enabled 1
in the template you pasted, but I will take your word for it.
So now my question is, where did you get those macro names?
SERVICETYPE ???
nagios.sourceforge.net/docs/2_0/macros.html
Completely right
I was about to post the modified macro names
SERVICETYPE was SERVICESTATETYPE and i had to add double quotes to the HOSTALIAS macro.
Now is almost anything right but one thing.
The macro $CONTACTEMAIL$ does not expand , what i’ve done is to change the script name to see in the logs how is it invoked:
[05-08-2006 16:31:02] Warning: Attempting to execute the command "/usr/local/nagios/libexec/notificar_procs CRITICAL SOFT 1 CPU srvnt04 "Print Server #1"" resulted in a return code of 127. Make sure the script or binary you are trying to execute actually exists...
[05-08-2006 16:31:02] SERVICE EVENT HANDLER: srvnt04;CPU;CRITICAL;SOFT;1;notificar_procs
[05-08-2006 16:31:02] SERVICE ALERT: srvnt04;CPU;CRITICAL;SOFT;1;5
So the line is:
/usr/local/nagios/libexec/notificar_procs CRITICAL SOFT 1 CPU srvnt04 “Print Server #1"”
The $CONTACTEMAIL$ seems empty. :S
Thanks again!!
It seems that none CONTACT macro is allowed to be an argument of event handlers, so nagios replaces it with an empty string. :S
The fact is i want to send an email to contacts as event handler.
Basically this is the chain of actions:
- CPU service goes Critical on a windows 2k3 box
- Nagios notifies CPU service to contacts.
- Nagios executes the event handler.
- The event handler takes a picture of running procs on the windows box and passes it to the Nagios box (by scp).
- Nagios sends an email to contacts with a message and the file as attachment.
Til now i just need to pass the $CONTACTEMAIL$ to the event handler and then, the whole stuff would be working.
Perhaps i’ve done things complicated.
Is there a way to do this with out event handler??
Thanks in advance.
Sounds like you have it setup right. All I"m saying is, that the macro names have changed vrom v1.x to 2.x, so you need to double check your scripts to make sure all of them are correct.
I’ve finally made a simple scripts that gets as an argument a contact group name and returns the mailboxes of each member of that group.
That way i can send the notification inside the event handler.
Everything’s working now
The script can be useful for many other tasks, here it goes:
#!/bin/sh
#
# It returns mailboxes of contacts that belong to arg1 group.
#
# by Mauro Oddi
if $# -ne 1 ]; then
echo "USAGE: $0 <GROUPNAME>" >&2
exit 1
fi
NAME=$1
ETCGRP=/usr/local/nagios/etc/contactgroups.cfg
ETCCON=/usr/local/nagios/etc/contacts.cfg
if -f $ETCGRP -a -f $ETCCON ]; then
if grep -ne "contactgroup_name" $ETCGRP| grep "\<$NAME\>" >/dev/null; then
LISTA1=$( sed -n "/$NAME/,/}/p" $ETCGRP | grep members | awk '{ print $2 }' )
else
echo "Group $NAME does not exists in $ETCGRP." >&2
exit 1
fi
LISTA2=$(
for i in $(echo $LISTA1 | tr ',' ' '); do
sed -n "/$i/,/}/p" $ETCCON | grep "^-]email" | awk '{ print $2 }'
done
)
echo $LISTA2
exit
fi
echo "Could not open file/s." >$2
exit 1
Any critic is welcome.
out.