Problems with Nagios notifications


#1

I’m having a few problems with configuring Nagios notifications.

The Nagios application itself is working perfectly, I’ve tested it extensively and I can see no issues whatsoever. My problem is with notifications and executing a external PHP script to generate a SMS message to notify the circuit owner. I feel the problem lies with the permissions rather than anything else. Can anyone help me with what permissions should be set for an external script called from within Nagios?

root@Nagios-2:/usr/local/nagios/libexec# ls -all notify-service-by-sms -rwxrwxrwx 1 nagios nagios 1179 2008-06-18 18:37 notify-service-by-sms

The script is currently owned by the Nagios account “nagios” and a member of the “nagios” group. I know the PHP script has far more permission than is necessary but I’m at my wit’s end and trying anything at this point. :?

Here are the contents of the PHP script:

[code]#!/usr/bin/php5

<?php $dbhost = "localhost"; $dbuser = "username"; $dbpass = "password"; $dbname = "nagios"; $cct = $argv[1]; $time = date(DATE_RFC822); $conn = mysql_connect($dbhost, $dbuser, $dbpass) or die('Error connecting to mysql'); mysql_select_db($dbname); $query = "SELECT tetra FROM circuits WHERE cct ='".$cct."'"; $result = mysql_query($query); while($row = mysql_fetch_row($result)) { $target = "http://somedomain.com/nagios/sms.asp?c=$cct:".$row[0].""; } mysql_close($conn); echo passthru('sudo wget --append-output=notify-by-sms.log --spider "'.$target.'"'); ?>[/code]

As you can see, it’s a very straight-forward script which does nothing more than retrieve some additional information based on what is returned by Nagios and then calls a script on another server. Due to the security restrictions on the Nagios box, this is the only option.

The output (from notify-by-sms.log):

--18:38:23-- http://somedomain.com/nagios/sms.asp?c=CIRCUITNAME:999999 => `sms.asp?c=CIRCUITNAME:999999' Resolving some.proxy.net... 212.x.x.x Connecting to some.proxy.net|212.x.x.x|:3128... connected. Proxy request sent, awaiting response... 200 OK Length: 0 [text/html] 200 OK

The script works perfectly when executed from the shell prompt (as “root” and “nagios”) as you can see above from the wget output. I have even tried adding the accounts “nagios” and “www-data” (apache2) to the sudoers file but to no avail.

The configuration scripts belonging to Nagios are also working fine as there is evidence in the logs that it is doing what expected but there is an invisible wall somewhere between Nagios and my script which is preventing it from doing it’s job.

In case it helps, this is what version is installed on the box.

root@Nagios-2:/usr/local/nagios/libexec# cat /proc/version Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-18etch1) (waldi@debian.org) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Sun Feb 10 17:50:19 UTC 2008

Can anyone save my sanity? I am doing something blatantly stupid? If anyone needs any more information about how everything is set-up, just ask.

…and sorry for such a huge first post from a new user… :slight_smile:


#2

Hey,

Can you paste in your service definition and your checkcommand definition?

Also in the meantime, tail your nagios.log and grep for that check. paste any results in here too

Thanks


#3

define service{
use dsl-line
host_name 015423
service_description CIRCUITNAME
check_command check_dsl_line!217.x.x.x!ifOperStatus.2111!1!1!gHdFFdx!STATUS
}

‘notify-service-by-sms’

define command {
command_name notify-service-by-sms
command_line $USER1$/notify-service-by-sms $SERVICEDESC$
}

‘check_dsl_line’ command definition

define command{
command_name check_dsl_line
command_line $USER1$/check_snmp -H $ARG1$ -o $ARG2$ -w $ARG3$ -c $ARG4$ -C $ARG5$ -l $ARG6$
}

[1213809590] SERVICE ALERT: 015423;CIRCUITNAME;CRITICAL;HARD;3;STATUS CRITICAL - 2
[1213809590] SERVICE NOTIFICATION: support;015423;CIRCUITNAME;CRITICAL;notify-service-by-sms;STATUS CRITICAL - 2

As far as I can see, there is nothing wrong. The actual monitoring works fine - hence I’m lost as to what is the problem…


#4

Hi

I’m unfamiliar with PHP but when I had a similar issue with a perl script that simply refused to complete under nagios I started by echoing something back to a log file at every step in the process - that way I was able to discern at what stage it was going titsup and solve/workaround the issue. As I say, dunno PHP but I’d be surprised if you couldn’t do something like that, say if you start at the begining echoing back “Script initiated” then maybe next confirm that $ARG[1] is populated as expected by echoing that back to the log, then $result and so on… then you might be able to see where the wheels are coming off.

Just a thought. Might help, probably no harm in trying.

/S


#5

I have managed to prove the script itself is passing data correctly by calling it from the shell prompt and supplying the correct argument (in this case the $SERVICEDESC$ macro). It’s something between Nagios and the script and I’m almost certain it’s something to do with permissions on the account Nagios is running under.

What permissions did you set for your Perl script so that it worked from Nagios?


#6

755, but I think if it works under the nagios user from the shell prompt it’ll at least be partially running from nagios itself, however that is not to say that perhaps nagios is passing $ARG[1] like you expect, or anything else isn’t going awry. For example in my case it was because the linux OS wouldn’t let nagios perform a sudo operation on a TTY where my gprs modem is connected, as it was not running in a proper shell like it would be if I had run it (and indeed it worked) from the command line. Now I’d have never seen the error message that pertained to this unless i started echoing back various variables, results and &2>1…
edit: i mean 2>&1… always get that wrong :frowning:


#7

I’m not so sure nagios is passing $SERVICEDESC$ how you want it to. Is “CIRCUITNAME” really what it’s called or were you just filling that in for forum privacy reasons? Regardless, try this out:

Temporarily change this:

‘notify-service-by-sms’

define command {
command_name notify-service-by-sms
command_line $USER1$/notify-service-by-sms $SERVICEDESC$
}

to this:

‘notify-service-by-sms’

define command {
command_name notify-service-by-sms
command_line ** echo $SERVICEDESC$ > /tmp/test **
}

then cat /tmp/test to see exactly what it’s passing to your script. maybe throw some " " around SERVICEDESC, or us a printf instead of echo to see if theres any funky characters in there.

Now, if it is passing what it should, maybe it is a permissions issue like you say. Change your checkcommand to this:

‘notify-service-by-sms’

define command {
command_name notify-service-by-sms
command_line ** sudo $USER1$/notify-service-by-sms $SERVICEDESC$ **
}

And add the following line to your sudoers file:

** nagios ALL=(ALL) NOPASSWD: /path/to/notify-service-by-sms **

Let me know how that goes


#8

" CIRCUITNAME" is simply for privacy reasons, the actual string passed is only alphanumerical characters - nothing which would confuse a script.

I will try this in the next 20 minutes and post my results.


#9

root@Nagios-2:/tmp# pico test
GNU nano 2.0.2 File: test

CIRCUITNAME

Nagios itself appears to be passing the correct data.


#10

It works :slight_smile:

It appears it was permissions after all. Thank you for all your help. Perhaps a short tutorial on the necessary permissions required for external commands should be added to the official Nagios documentation?

Thanks again!


#11

Good to hear.
The issue you were having was likely permissions+shell-related. Just out of curiosity, when you were testing with the nagios user, did it drop you back into a /bin/sh shell or were you in a bash shell? (echo $SHELL as nagios user)
Pretty much any application that runs non-root and spawns external processes can run into permissions issues so i don’t think this will ever make it to nagios documentation. The NOPASSWD sudo entry would be handy for a lot of people to know though.