Hello all -
I am having a problem getting detailed information to show in my Nagios e-mail notifications. What I would like, is for my e-mail notifications to be able to tell me what the actual service check was for that particular host/service, what thresholds are “warning” and “critical”, basically, what the normal is and why the alert was sent…
Anyway, here are my configs for the e-mail notifications for Hosts/Services and below them is output that I am getting for each.
Yes I am using Nagios 2 and yes that change worked. Thanks SonOfThunder !
I am getting output for additional information such as:
CRITICAL - Plugin timed out after 10 seconds
SNMP CRITICAL - 95
SNMP OK - 75
SNMP problem - No data received from host
etc…
As you can see, those are the output results from the plugin that I am using for a particular host or service. More than what I was getting so that’s great.
However, I would like to know if there is a way to get even more information to display there? For instance, the “SNMP CRITICAL 95” that you see is telling me that the CPU Load Percentage on a machine is at 95% which is obviously passed the threshold that I have set for that machine.
Since I am not the only admin these e-mail alerts go to, I would like for the other admin to have a little more info regarding what is happening. Such as the SNMP check that is actually running for that host and what the thresholds are and any other useful info other than just the actual SNMP output info. Something a little more detailed.
The only way I can think is to write a script per host/service that is activated when it fails. That sounds a little too tedious as we have too many machines and services that we are monitoring.
I was thinking maybe there is a way that I can add something to the config file per host/service and if it fails or whatever, then I would get that extra info in there as well. Say a variable that is read and displayed along with the $HOSTOUTPUT / $SERVICEOUTPUT when e-mail notifications are sent out?
Anyway, if you guys can think of how I can get more detailed info for my e-mail notifications, I would greatly appreciate your input. Thanks again!!!
Easy solution: just put the nagios link in the email… if they need more info they can chek it out on the web interface
If not possibily you can put something more but i’m not sure which macros you should be looking for… but i believe the command being run for the check should be available.
Modify the notifybyemail command and put what you want in it.
For example, you could add $HOSTCHECKCOMMAND$ or $SERVICECHECKCOMMAND$ which sounds like exactly what you wanted.
Thanks a bunch guys. That is exactly what I wanted. At this point, I can show the actual command that I am using for the service/host check, the output from that command and the link to the portal.
Alright guys it’s me again. One more question regarding the e-mail notifications. Is there any way to have the notifications give detailed information about what is being tested and why the alert?
For example, when checking the load on a serial interface we are using the command;
check_snmp!passwd!OID!153!191
Now in English, that command means, if the Serial Interface reaches 60% (153), send a warning notification. If it reaches 75%(191),send a critical notification.
So when this happens, I get a notification which looks like this…
Nagios
Notification Type: Warning
Service:Load on Serial Interface
Host: IP of router
Address: IP of router
State: WARNING
Date/Time: …
Additional Info: check_snmp!passwd!theOID!153!191
SNMP WARNING - 162
My question is, is there a way that I get get the notification to let us know, in easier to understand terms, what is going on with this host/service?
For example, can we get a notification that would say, “The threshold for this host has reached it’s warning threshold of more than 60%.” or something similar to that as opposed to the actual check?
Also, I’ve decided to take out that Macro because I didn’t want the passwd of the routers going through mail unencrypted.
Thanks guys for any additional input you can provide.
-l $ARG4$ in your check definition in checkcommands.cfg would help. Instead of your output saying “SNMP” Warning… it could say:
Threshold exceeded Warning -162
./check_snmp --help and look at the -l option
-l $ARG4$ in your check definition in checkcommands.cfg would help. Instead of your output saying “SNMP” Warning… it could say:
Threshold exceeded Warning -162
./check_snmp --help and look at the -l option
-l $ARG4$ in your check definition in checkcommands.cfg would help. Instead of your output saying “SNMP” Warning… it could say:
Threshold exceeded Warning -162
./check_snmp --help and look at the -l option
-l $ARG4$ in your check definition in checkcommands.cfg would help. Instead of your output saying “SNMP” Warning… it could say:
Threshold exceeded Warning -162
./check_snmp --help and look at the -l option
-l $ARG4$ in your check definition in checkcommands.cfg would help. Instead of your output saying “SNMP” Warning… it could say:
Threshold exceeded Warning -162
./check_snmp --help and look at the -l option
-l $ARG4$ in your check definition in checkcommands.cfg would help. Instead of your output saying “SNMP” Warning… it could say:
Threshold exceeded Warning -162
./check_snmp --help and look at the -l option
Also, you could add that Macro back in, and just change your command to use the $USER$ macros defined in resource.cfg.
For example:
$USER5$=mypassword
so in the command, all they will see in the email is $USER5$ and not the actual password.
I should not have used $ARG4$ in my reply at all. I just copied/pasted from my config. Sorry. Tailor the command definition to your liking. All I’m saying is ./check_snmp --help and look at what the -l option can do for you. It may give you a better output that just SNMP Critical. Instead maybe it could say “Traffic threshold Critical” or “Temperature threshold” or whatever you want it to say.