Problems sending sms with Nagios 2.0b4 and gnokii 0.67


#1

Hi all,

I’ve recently set up Nagios2.0b4 and all is going well except I can’t get gnokii 0.67 to send sms messages to me which is kinda critical as I’ve got 55 servers to manage and a life to lead away from checking email. I’ve tried several approaches including the one listed in the FAQ at

http://www.nagios.org/faqs/viewfaq.php?faq_id=220

If you’ll forgive me for posting swathes of config, I hope somebody will be gracious enough to look over it for me. I’ve lost nearly 2 weeks trying to get it up and maybe fresh eyes will help. Incidentally, email notifications work fine.

define contactgroup{ contactgroup_name test alias Nagios test members adam }

Obviously not the real contact details

define contact{
contact_name adam
alias Adam
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,r
service_notification_commands notify-by-email,notify-by-gnokii
host_notification_commands host-notify-by-email,host-notify-by-gnokii
email [email protected]
pager 01234567890
}

#Forgive the linewrapping below if any, also note that I have commented out lines that also don’t work when swapped in

define command{
command_name notify-by-gnokii

command_line /bin/echo “Nagios Service Alert: $SERVICEDESC$ on $HOSTALIAS$ $HOSTADDRESS$ is $SERVICESTATE$; $SERVICEOUTPUT; $LONGDATETIME$” | /usr/local/bin/gnokii --sendsms $CONTACTPAGER$

command_line	/usr/local/nagios/etc/notify-by-gnokii $CONTACTPAGER$ "Nagios Service Alert: $SERVICEDESC$ on $HOSTALIAS$ $HOSTADDRESS$ is $SERVICESTATE$; $SERVICEOUTPUT$; $LONGDATETIME$"
}

define command{
command_name host-notify-by-gnokii

command_line /bin/echo “Nagios Host Alert: $HOSTNAME$ $HOSTADDRESS$ is $HOSTSTATE$; $HOSTOUTPUT$; $LONGDATETIME$” | /usr/local/bin/gnokii --sendsms $CONTACTPAGER$

command_line	/usr/local/nagios/etc/notify-by-gnokii $CONTACTPAGER$ "Nagios Host Alert: $HOSTALIAS$ $HOSTADDRESS$ is $HOSTSTATE$; $HOSTOUPUT$; $LONGDATETIME$"
}

#Test check

define service{
use generic-service
host_name xxx
service_description HTTP
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups test
notification_interval 960
notification_period 24x7
notification_options w,u,c,r
check_command check_http
}

#The all important notify-by-gnokii script

#!/bin/sh

Gnokii Plugin script

© Horst venzke

v 0.1 - 17.01.2004

mess=$2
number=$1
echo $mess | gnokii --sendsms $number

This script is /usr/local/nagios/etc/notify-by-gnokii, is executable and is owned by nagios:nagios. As a test, I added in a

mail [email protected] -s gnokii < /dev/null

line into the notify-by-gnokii script and I don’t get that mail so the script isn’t getting run properly.

Making a test script which is the same but resolves the variables within the file and is executed on the command line by user nagios works fine. I have also seen the variables resolve in the config file, though they no longer appear since I moved the config out to separate files.

I can see the script entries in my logs, running a

watch tail /usr/local/nagios/var/nagios.log
shows the script zip through, which is wrong as manually sending an sms using gnokii takes about 50 seconds (I will have to look at this too, but I have upped the nagios command time out to around 3 minutes for the time being to factor this out as a problem).

Here is a log snippet:

[1124113032] SERVICE ALERT: xxx;HTTP;CRITICAL;SOFT;1;Connection refused [1124113092] SERVICE ALERT: xxx;HTTP;CRITICAL;SOFT;2;Connection refused [1124113152] SERVICE ALERT: xxx;HTTP;CRITICAL;HARD;3;Connection refused [1124113152] SERVICE NOTIFICATION: adam;xxx;HTTP;CRITICAL;notify-by-gnokii;Connection refused [1124113152] SERVICE NOTIFICATION: adam;xxx;HTTP;CRITICAL;notify-by-email;Connection refused [1124122152] SERVICE ALERT: xxx;HTTP;OK;HARD;3;HTTP OK HTTP/1.1 200 OK - 349 bytes in 0.010 seconds [1124122152] SERVICE NOTIFICATION: adam;xxx;HTTP;OK;notify-by-gnokii;HTTP OK HTTP/1.1 200 OK - 349 bytes in 0.010 seconds [1124122152] SERVICE NOTIFICATION: adam;xxx;HTTP;OK;notify-by-email;HTTP OK HTTP/1.1 200 OK - 349 bytes in 0.010 seconds

Obviously the service check and the machine in question are written into the various groups and check commands correctly, as I said, email notifications work, I just thought it better to omit config that is easily implied.

If anyone is able to advise I would be very grateful and post further config if necesssary.

Thanks,

Adam


#2

Apologies, some of the tags seems to have messsed up, notably the code ones and the url one seems to have left a < on the end of the link at the top to the gnokii nagios faq.


#3

Ok, I ruled out a notification_timeout I changed the connection type in gnokii and now sms messages are sent in 5 or 6 seconds, but the problem remains. I changed notification_timeout back to 60 seconds in nagios.cfg.


#4

OK problem solved. Not only does the notify-by-gnokii script taken from [http://www.nagios.org/faqs/viewfaq.php?faq_id=220](hyperlink url) fail to execute with a code 127 regardless of permissions, but some of the nagios variables don’t exist ($OUTPUT$ and $DATETIME$ spring immediately to mind) and also the ; symbols used to separate the data cause the whole thing to fail silently even when you echo the relevent details to gnokii without using the script. In fact echoing the information to gnokii using a pipe and removing the semicolons causes a lot more logging to go on. Go figure.

For anyone seeing this problem themselves, my final pager notification commands were (forgive the line breaks and formatting):

define command{
command_name notify-by-gnokii
command_line /bin/echo “Nagios Service Alert: $NOTIFICATIONTYPE$: $SERVICEDESC$ on $HOSTALIAS$ $HOSTADDRESS$ is $SERVICESTATE$ $SHORTDATETIME$”| /usr/local/bin/gnokii --sendsms $CONTACTPAGER$
}

define command{
command_name host-notify-by-gnokii
command_line /bin/echo “Nagios Host Alert: $HOSTNAME$ $HOSTADDRESS$ is $HOSTSTATE$ $HOSTOUTPUT$ $SHORTDATETIME$” | /usr/local/bin/gnokii --sendsms $CONTACTPAGER$
}

All bar some final tweaking to make the sms read easily.


#5

nagios 2 changed some variable names… like LASTCHECK became SERVICELASTCHJECK… nagios 2.0b4 checnged one more (have a look at the change list). Possibly the script was created for nagios 1. (same problems you get with nagiostat on nagios 2 :slight_smile: )

Luca