Nagios: nrpe and problem alerts


#1

Greetings,

I’ve recently set up an Nagios central server and a test machine with installed NRPE to launch local plugins on the distant machine. This seems to work well but I’ve got a problem I was’nt able to solve yet:

When you check the status of an script to monitor the used disk space via the web interface, you get something like this:

Root Partition WARNING 21-12-2005 14:57:02 0d 1h 5m 11s 4/4 DISK WARNING - free space: / 1416 MB (15%):

Notice that Nagios is able to display the DISK WARNING message spit out by the plugin, that’s good. But it’s not so good that Nagios sends out problem alerts like this:

***** Nagios *****

Notification Type: PROBLEM

Service: Root Partition
Host: Ural
Address: 10.2.2.242
State: WARNING

Date/Time: Wed Dec 21 13:52:41 CET 2005

Additional Info:

$

… the additional info is missing. Services that are checked localy on the Nagios main server display their additional info properly, but distant services via NRPE do not it seems.
Any idea on how to solve this?


#2

You would have to look at your misccommands.cfg file. You might have messed it up some how.
it should read like this:
define command{
command_name notify-by-email
command_line /usr/bin/printf “%b” “***** Nagios ***\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $DATETIME$\n\nAdditional Info:\n\n$OUTPUT$" | /bin/mail -s " $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **” $CONTACTEMAIL$
}


#3

That’s already basically how my notify-by-email looks like:


define command{
command_name notify-by-email
command_line /usr/bin/printf “%b” “***** Nagios ***\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$OUTPUT$" | /usr/bin/mail -s " $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **” $CONTACTEMAIL$
}

I’ve tried around a little and have written a very basic script to check DNS status:

#!/bin/bash

command line parameters

(they most likely come from /usr/local/nagios/etc/checkcommands.cfg or company.cfg)

HOSTADDRESS=$1

if host www.google.de $HOSTADDRESS>/dev/null; then echo “www.google.de resolved successfully”; exit 0
else echo "Failed to resolve www.google.de! exit 2
fi

When executed normally, it reports its status ok on the web interface and only an $ in the status e-mails. Now if I play around with the path to my mail, eg change
$LONGDATETIME$\n\nAdditional Info:\n\n$OUTPUT$" | /usr/bin/mail -s “**
to
$LONGDATETIME$\n\nAdditional Info:\n\n$OUTPUT$” | /usr/nothing -s "**
… then I get the following error output in /var/log/nagios/nagios.log:

[1135323570] Warning: Attempting to execute the command “/usr/bin/printf “%b” “***** Nagios ***\n\nNotification Type: PROBLEM\n\nService: DNS\nHost: Meskalin\nAddress: 10.2.0.5\nState: CRITICAL\n\nDate/Time: $DATETIME$\n\nAdditional Info:\n\n$OUTPUT$" | /usr/nothing -s " PROBLEM alert - Meskalin/DNS is CRITICAL **” [email protected]” resulted in a return code of 127. Make sure the script or binary you are trying to execute actually exists…

Notice that $OUTPUT$ does not seem to get substituted. I’ve searched alot but was still not able to find where this problem may originate from. :?


#4

If your using nagios 2 try using $SERVICEOUTPUT$ instead of $OUTPUT$

Luca


#5

That did it! Yes, I’m using Nagios 2 and after I changed $OUTPUT$ to $SERVICEOUTPUT$ the additional info displays properly in the e-mail notifications now. Thanks Luca! :slight_smile: