Using Nagios 3, I’m trying to send email notifications when servers get low on disk space. I’ve googled and read so many doc’s and tut’s on setting this up that my eyes are bleeding and still can not get Nagios to send out any emails. I’ve made the necessary entries to the contacts.cfg and contactgroups.cfg files and still get nothing. I can manually send myself an email from the command line with mailx so I know its working and the problem is with Nagios but for the life of me I can not find out where the problem is.
Not sure what information I need to include here, so if there’s a critical piece that anyone needs to help me troubleshoot this let me know and I’ll post it a.s.a.p.
Thanks in advance to anyone who can help point me in the right direction.
There are notifications showing on the notifications page and it looks like their only being sent once at 00:30:00 hours so I might get one tonight. I need them to be sent every 15 mins or so until someone (me) acknowledges them while I test the system out. The end goal is to have Nagios send out emails at first and then escalate to sms text messages when the database servers get lower than xGB of free space.
Came in today to find one of the routers that I monitor had experienced about 10 minutes of high ping times. On the notifications tab there is an entry that I was sent an email but there’s no email from Nagios in my in box. Does this mean that Nagios is sending the emails and that the problem is with Postfix/mailx and that I should be troubleshooting why it never sent out the email?
Thanks again for any help its greatly appreciated.
Update: I added myself as a user with full permissions to the web interface and when I go back through the notifications logs I found:
[2010-09-02 22:34:20] Warning: Attempting to execute the command “/usr/bin/mailx “%b” “***** nagios ***\n\nNotification Type: PROBLEM\n\nService: F:\ Drive Space\nHost: \nAddress: \nState: CRITICAL\n\nDate/Time: Thu Sept 2 22:34:20 EDT 2010\n\nAdditional Info:\n\nf:\ - total: 931.51 Gb - used: 909.06 Gb (98%) - free 22.46 Gb (2%)" | /bin/mail -s " PROBLEM Service Alert: /F:\ Drive Space is CRITICAL **” root@localhost” resulted in a return code of 127. Make sure the script or binary you are trying to execute actually exists…
This happened last night after I forced it to send a custom notification, which kind of explains why I didn’t get the notification. I say kind of because I have no idea what the above means but from the looks of “resulted in a return code of 127. Make sure the script or binary you are trying to execute actually exists…” I would guess its not good.
There is also 2 more similar messages from this morning at 3am when one of the routers I monitor had 100% packet loss for about 10 mins the only difference in the messages is the context of the alert and the email address that it tries to send the message to. Both of them end with the same “resulted in a return code of 127. Make sure the script or binary you are trying to execute actually exists…” line.
After looking the above error over I saw that after the pipe in the command it was pointed to /bin/mail so I changed it to /usr/bin/mailx and now in the event log I have:
When I try that the mailbox opens there’s 20 messages in it some from nagios@ and some from mailer-daemon@. The ones from mailer-deamon are undelivered mail returned to sender messages.
Ahh ha I think I’m starting to understand why I’m not getting the custom notifications, on the notifications page in the web interface the only contact that’s listed is root, the custom notifications aren’t getting sent to me as well.
Now the question is where is the “who gets notified for custom notifications” defined at? For the time being so that I can keep troubleshooting I’m going to change the root contact’s email to mine and see if I get the notification. If I do then that should tell me that everything is working, I think.
Think I’ve got it all figured out now, I found in /etc/nagios3/conf.d the configuration files for a couple different thing. Most importantly there is:
contacts_nagios2.cfg – which has the “root” contact defined and is where I put my email to test the custom notifications to see if the system was working at all.
The next important file was:
generic-service_nagios2.cfg – in here is where “who gets notified for custom notifications” is defined or more specifically when this file is called who gets notified. Either way I added in the localadmins contact group that I had created in /etc/nagios3/mysite and sent out another custom notification (after I changed contacts_nagios2,cfg back to root@localhost and restarted).
And everyone in the localadmins group got the emails!