Up notification sent, down is not


#1

Hello,

I am running 2.0rc1 but I have also experienced the problem running 2.0b4.

I have written a custom notification script. When I use this notification script, I only get UP and UNREACHABLE notifications for certain hosts using a certain template. I get all notifications with other hosts and templates. When switching a host to use a different template, the old behavior remains, while other hosts using this template get all notifications.

When I use the standard host-notify-by-email all notifications are sent out, regardless of template or host.

At several points in the custommade notification script some data is written to a temporary file for debugging purposes. These files are filled with the correct data when a UP or UNREACHABLE notification is sent out, but not filled with any data when a DOWN notification is given. This indicates to me, the notification script is not called at all under this circumstance. Unfortunately I am not able to get more data out of Nagios for further debugging. Any help is welcome.

Here is the host_templates.cfg. The problems I am experiencing is with the hosts initially using the vpntunnel template.

host_templates vpnrouter

define host {
name vpnrouter
process_perf_data 0
retain_status_information 1
flap_detection_enabled 1
low_flap_threshold 0
high_flap_threshold 0
retain_nonstatus_information 1
active_checks_enabled 1
passive_checks_enabled 1
check_period 24x7
obsess_over_host 1
check_freshness 1
freshness_threshold 0
check_command check-host-alive
max_check_attempts 3
event_handler_enabled 1
event_handler check-host-alive
notifications_enabled 1
notification_interval 15
notification_period 24x7
notification_options d,u,r,f
contact_groups xxxoperators-office
register 0
}

host_templates vpntunnel

define host {
name vpntunnel
process_perf_data 0
retain_status_information 1
flap_detection_enabled 1
low_flap_threshold 0
high_flap_threshold 0
retain_nonstatus_information 1
active_checks_enabled 1
passive_checks_enabled 1
check_period 24x7
obsess_over_host 1
check_freshness 1
freshness_threshold 0
check_command check-host-alive
max_check_attempts 3
event_handler_enabled 1
event_handler check-host-alive
notifications_enabled 1
notification_interval 15
notification_period 24x7
notification_options d,u,r,f
contact_groups ncc_group,xxxoperators-office
register 0
}

This is one of the offending hosts

define host {
host_name DSLxxxxTunnel
alias xxxx VPN Tunnel
address 10.0.3.5
use vpntunnel
parents DSL_xxxx
}

Whereas this host gives no issues at all

define host {
host_name DSL_xxxx
alias Praktijk xxxx
address nnn.ooo.ppp.qqq
use vpnrouter
parents xxxxuplink
}

This is the notification command:

define command {
command_name hostiamoffice
command_line /usr/share/nagios/scripts/bin/iAMsend.sh 1%%$HOSTSTATEID$%%$NOTIFICATIONNUMBER$%%$HOSTCHECKCOMMAND$%%$HOSTNAME$%%$HOSTALIAS$%%$HOSTSTATE$%%$HOSTOUTPUT$%%$LONGDATETIME$%%$NOTIFICATIONTYPE$%%$HOSTADDRESS$
}

And this is the script:

echo “PID” $$ > /tmp/out.log

#!/bin/bash
datum=date +%Y_%m_%d
datum1=date +%d-%b-%Y
tijd=date +%H:%M:%S
datum2=date +%Y_%m_%d_2
datetimestamp=date +%Y%m%d%H%M%N

wegschrijven naar xxxx bestand

bestand1=/opt/iAM/server/nagios-$datetimestamp_00.evt
bestand2=/opt/iAM/server/nagios-$datetimestamp_00_2.evt

Invullen van de parameters.

echo “All Variables” $* >> /tmp/iamoutput.txt
notificationperiod=$(echo $* | awk ‘{n=split($0,I,"%%"); print I[1]}’)
hoststateid=$(echo $* | awk ‘{n=split($0,I,"%%"); print I[2]}’)
notificationnumber=$(echo $* | awk ‘{n=split($0,I,"%%"); print I[3]}’)
hostcheckcommand=$(echo $* | awk ‘{n=split($0,I,"%%"); print I[4]}’)
hostname=$(echo $* | awk ‘{n=split($0,I,"%%"); print I[5]}’)
hostalias=$(echo $* | awk ‘{n=split($0,I,"%%"); print I[6]}’)
hoststate=$(echo $* | awk ‘{n=split($0,I,"%%"); print I[7]}’)
hostoutput=$(echo $* | awk ‘{n=split($0,I,"%%"); print I[8]}’)
longtimedate=$(echo $* | awk ‘{n=split($0,I,"%%"); print I[9]}’)
notificationtype=$(echo $* | awk ‘{n=split($0,I,"%%"); print I[10]}’)
hostaddress=$(echo $* | awk ‘{n=split($0,I,"%%"); print I[11]}’)
event_type=7
valid=1
host=xxxx_nagios_$notificationperiod
date=$datum1
ttime=$tijd

case $hoststateid in
0)
state=C
severity=5
;;
*)
severity=1
case $notificationnumber in
0)
state=N
;;
*)
state=U
;;
esac
;;
esac
class="xxxx ADMIN"
subclass=
handle_number=0
handle_name=$hostcheckcommand
context=$hostname
creation_date=$datum1
creation_time=$tijd
#close_date=
#close_time=
update_count=$notificationnumber
problem_owner=root
brief="Host $hostname is $hoststate — Info: $hostoutput — Time: $longtimedate"
full=“Nagios — Notification Type: $notificationtype — Host: $hostname — State: $hoststate — Address: $hostaddress — Info: $hostoutput — Date/Time: $longtimedate”

echo “$event_type#$valid#xxxx_nagios_$notificationperiod#$date#$ttime#$state#$class#$subclass#$severity#$handle_number#$hostcheckcommand#$hostname#$creation_date#$creation_time###$notificationnumber#$problem_owner#$brief#$full##” > $bestand1
echo “$event_type#$valid#xxxx_nagios_$notificationperiod#$date#$ttime#$state#$class#$subclass#$severity#$handle_number#$hostcheckcommand#$hostname#$creation_date#$creation_time###$notificationnumber#$problem_owner#$brief#$full##” > $bestand2

tar cf - $bestand1 $bestand2 | ssh-agent ssh root@nnn.ooo.ppp.qqq “cd /;tar xf -”


#2

wow there’s an rc1 already… this is bleeding edge testing… :smiley:

why do you set flap detection enabled and use a low and high threshold which are both at 0? disable flap detection if you don’t use it, anyway i don’t think this has anything to do with you rproblem, the only difference i see is referring to the use of vpntunnel as a template instead of vpnrouter… is this always the case or not? are you sure you didn’t write down instead of DOWN somewhere in your code? There are different contact groups too… but still it should affect all notifications and not only the down…

Luca


#3

Thanx for your input

I had this problem also with earlier versions, thus I upgraded to bleeding edge.

I have set the threshold for the flap detection to 0 to let nagios determine them automatically.

It seems that the problem occurs to those hosts which at one point in time have been using the vpntunnel template.

As you can see in my code, I use $HOSTSTATEID$ to get the numerical representation of the state. There is no DOWN or down anywhere in the code. In any case, the shell script should still write data to the temp files if executed at all. This is not the case. I am now compiling a nagios binary with all debugging turned on. Perhaps this will help.


#4

I have solved the puzzle. When running nagios with all debug options on, you get a full view of everything that is happening.

I now saw that in case of an DOWN state, $HOSTOUTPUT$ returns a message including the ip address enclosed in parenthesis. After quoting the entire argument of iAMsend.sh the problem was solved.

Again thanx for the assist.

Kind regards,

Harald Paterek