Nagios E-mail Notifications


#1

Hello all -
I am having a problem getting detailed information to show in my Nagios e-mail notifications. What I would like, is for my e-mail notifications to be able to tell me what the actual service check was for that particular host/service, what thresholds are “warning” and “critical”, basically, what the normal is and why the alert was sent…

Anyway, here are my configs for the e-mail notifications for Hosts/Services and below them is output that I am getting for each.

#########################

E-mail Notifications for HOSTS

#########################
define command{
command_name host-notify-by-email
command_line /usr/bin/printf “%b” “** Nagios **\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $OUTPUT$\n\nDate/Time: $LONGDATETIME$\n” | /bin/mail -s “Host $HOSTSTATE$ alert for $HOSTNAME$!” $CONTACTEMAIL$
}

############################

E-mail notifications for SERVICES

############################
define command{
command_name notify-by-email
command_line /usr/bin/printf “%b” “** Nagios \n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTNAME$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$OUTPUT$" | /bin/mail -s " $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **” $CONTACTEMAIL$
}


My output when I would receive an alert for hosts/services look like this. . .

** Nagios **

Notification Type: PROBLEM
Host: host.hostname.com
State: DOWN
Address: x.x.x.x
Info: $

Date/Time: Mon Jan 2 17:47:20 EST 2006


** Nagios **

Notification Type: PROBLEM

Service: Load Percentage
Host: hostname
Address: x.x.x.x
State: WARNING

DATE/TIME: Mon. Jan 2 17:47:32 EST 2006

Additional Info:

$


Anyway, it looks like the $OUTPUT variable in my e-mail config is not doing the trick. Does anyone have any suggestions to achieve my desired results?

Thanks in advance 8)


#2

Try using the $SERVICEOUTPUT$ and $HOSTOUTPUT$ macros instead. Works for me.


#3

it works as long as the problem is on nagios 2 :slight_smile:

Luca


#4

Yes I am using Nagios 2 and yes that change worked. Thanks SonOfThunder !

I am getting output for additional information such as:

CRITICAL - Plugin timed out after 10 seconds
SNMP CRITICAL - 95
SNMP OK - 75
SNMP problem - No data received from host
etc…

As you can see, those are the output results from the plugin that I am using for a particular host or service. More than what I was getting so that’s great.

However, I would like to know if there is a way to get even more information to display there? For instance, the “SNMP CRITICAL 95” that you see is telling me that the CPU Load Percentage on a machine is at 95% which is obviously passed the threshold that I have set for that machine.

Since I am not the only admin these e-mail alerts go to, I would like for the other admin to have a little more info regarding what is happening. Such as the SNMP check that is actually running for that host and what the thresholds are and any other useful info other than just the actual SNMP output info. Something a little more detailed.

The only way I can think is to write a script per host/service that is activated when it fails. That sounds a little too tedious as we have too many machines and services that we are monitoring.

I was thinking maybe there is a way that I can add something to the config file per host/service and if it fails or whatever, then I would get that extra info in there as well. Say a variable that is read and displayed along with the $HOSTOUTPUT / $SERVICEOUTPUT when e-mail notifications are sent out?

Anyway, if you guys can think of how I can get more detailed info for my e-mail notifications, I would greatly appreciate your input. Thanks again!!! :slight_smile:


#5

Easy solution: just put the nagios link in the email… if they need more info they can chek it out on the web interface :slight_smile:
If not possibily you can put something more but i’m not sure which macros you should be looking for… but i believe the command being run for the check should be available.

Luca


#6

nagios.sourceforge.net/docs/2_0/macros.html

Modify the notifybyemail command and put what you want in it.
For example, you could add $HOSTCHECKCOMMAND$ or $SERVICECHECKCOMMAND$ which sounds like exactly what you wanted.


#7

Thanks a bunch guys. That is exactly what I wanted. At this point, I can show the actual command that I am using for the service/host check, the output from that command and the link to the portal.

I greatly appreciate your assistance. :slight_smile:

Travis :slight_smile:


#8

Alright guys it’s me again. One more question regarding the e-mail notifications. Is there any way to have the notifications give detailed information about what is being tested and why the alert?

For example, when checking the load on a serial interface we are using the command;

check_snmp!passwd!OID!153!191

Now in English, that command means, if the Serial Interface reaches 60% (153), send a warning notification. If it reaches 75%(191),send a critical notification.

So when this happens, I get a notification which looks like this…


Nagios
Notification Type: Warning
Service:Load on Serial Interface
Host: IP of router
Address: IP of router
State: WARNING

Date/Time: …

Additional Info: check_snmp!passwd!theOID!153!191

SNMP WARNING - 162

My question is, is there a way that I get get the notification to let us know, in easier to understand terms, what is going on with this host/service?

For example, can we get a notification that would say, “The threshold for this host has reached it’s warning threshold of more than 60%.” or something similar to that as opposed to the actual check?

Also, I’ve decided to take out that Macro because I didn’t want the passwd of the routers going through mail unencrypted.

Thanks guys for any additional input you can provide.


#9

-l $ARG4$ in your check definition in checkcommands.cfg would help. Instead of your output saying “SNMP” Warning… it could say:
Threshold exceeded Warning -162
./check_snmp --help and look at the -l option


#10

-l $ARG4$ in your check definition in checkcommands.cfg would help. Instead of your output saying “SNMP” Warning… it could say:
Threshold exceeded Warning -162
./check_snmp --help and look at the -l option


#11

-l $ARG4$ in your check definition in checkcommands.cfg would help. Instead of your output saying “SNMP” Warning… it could say:
Threshold exceeded Warning -162
./check_snmp --help and look at the -l option


#12

-l $ARG4$ in your check definition in checkcommands.cfg would help. Instead of your output saying “SNMP” Warning… it could say:
Threshold exceeded Warning -162
./check_snmp --help and look at the -l option


#13

-l $ARG4$ in your check definition in checkcommands.cfg would help. Instead of your output saying “SNMP” Warning… it could say:
Threshold exceeded Warning -162
./check_snmp --help and look at the -l option


#14

-l $ARG4$ in your check definition in checkcommands.cfg would help. Instead of your output saying “SNMP” Warning… it could say:
Threshold exceeded Warning -162
./check_snmp --help and look at the -l option
Also, you could add that Macro back in, and just change your command to use the $USER$ macros defined in resource.cfg.
For example:
$USER5$=mypassword
so in the command, all they will see in the email is $USER5$ and not the actual password.


#15

jakkedup,
are you telling me that I have to use the -l option before $ARG4$?
Here is my checkcommand.cfg for that command;

define command{
command_name check_snmp
command_line $USER1$/check_snmp -H $HOSTADDRESS$ -C $ARG1$ -o $ARG2$ -w $ARG3$ -c $ARG4$
}

Here is what I interpret what your telling me to do…


define command{
command_name check_snmp
command_line $USER5$/check_snmp -H $HOSTADDRESS$ -C $ARG1$ -o $ARG2$ -w $ARG3$ -c -l “custom string” $ARG4$
}

Anyway, ofcourse that doesn’t work because I’m sure I’ve misinterpreted what you mean.

Anyway, if you can take a look at this and clarify, I would greatly appreciate it.

Thanks!!!


#16

I should not have used $ARG4$ in my reply at all. I just copied/pasted from my config. Sorry. Tailor the command definition to your liking. All I’m saying is ./check_snmp --help and look at what the -l option can do for you. It may give you a better output that just SNMP Critical. Instead maybe it could say “Traffic threshold Critical” or “Temperature threshold” or whatever you want it to say.