Host goes down, no notifications sent

njcwotx · February 16, 2010, 12:05am

I confirmed It sends emails, and I CAN send a custom notification on demand and get an email. When a server goes offline, it does not send a notification or log an entry. Below are the notification settings I have.

______________ /etc/nagios/nagios.cfg ____________________
log_notifications=1
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_notifications=1
enable_event_handlers=1

______________ /etc/nagios/nagiosql/hosttemplates.cfg ____________________

define host {
name generic-host
max_check_attempts 1
check_period 24x7
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
contact_groups admins
notification_period 24x7
notification_options d,u,r,f,s
notifications_enabled 1
register 0

}

______________ /etc/nagios/nagiosql/services/internet.cfg ___________________
define service {
hostgroup_name *
service_description PING
servicegroups Ping
use generic-service
check_command check_ping!200.0,20%!600.0,60%
register 1
}

______________ /etc/nagios/nagiosql/contacts.cfg ____________________

define contact {
contact_name name
alias name
host_notification_period 24x7
service_notification_period 24x7
host_notification_commands notify-host-by-email,notify-service-by-email
service_notification_commands notify-host-by-email,notify-service-by-email
email [email protected]
}

______________ /etc/nagios/nagiosql/servicetemplates.cfg ____________________

define service {
name generic-service
is_volatile 0
max_check_attempts 3
check_interval 10
retry_interval 2
active_checks_enabled 1
passive_checks_enabled 1
check_period 24x7
obsess_over_service 1
check_freshness 0
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_interval 60
notification_period 24x7
notification_options w,u,r,c
notifications_enabled 1
contact_groups admins
failure_prediction_enabled 1
register 0

}

luca · February 16, 2010, 2:40pm

is the contact “name” in the contactgroup admins?

njcwotx · February 16, 2010, 3:46pm

“name” is edited for the real name. yes, contact works. If I go to the host and send a custom alert, the email comes through to my inbox and shows in the log.

The puzzling part is when I simulate an alert by shutting down the test server, I see the host go down, but the notification event does not show up in nagios. I would understand a contact/email setup issue if the notification viewer showed an event, but the point is it does not even show up there.

This can get confusing, but pay attention to the detail here: When I check the “NOTIFICATIONS” page in nagios, i DO NOT see the event generated; however if I check the “EVENT LOG” page in nagios, it does appear to show an event. See below:

[02-16-2010 09:44:44] HOST EVENT HANDLER: SERVER;DOWN;HARD;1;notify-host-by-email
[02-16-2010 09:44:44] HOST ALERT: SERVER;DOWN;HARD;1;(Host Check Timed Out)
[02-16-2010 09:44:04] SERVICE EVENT HANDLER: SERVER;PING;CRITICAL;SOFT;1;notify-service-by-email
[02-16-2010 09:44:04] SERVICE ALERT: SERVER;PING;CRITICAL;SOFT;1;PING CRITICAL - Packet loss = 100%

njcwotx · February 16, 2010, 3:48pm

Here is what shows in “NOTIFICATION” when I send a custom alert and I do recieve the email:

Host Service Type Time Contact Notification Command Information
SERVER N/A CUSTOM (DOWN) 02-15-2010 17:48:29 Name ofPerson notify-host-by-email (Host Check Timed Out)
SERVER N/A CUSTOM (DOWN) 02-15-2010 17:48:29 Name ofPerson notify-service-by-email (Host Check Timed Out)
SERVER N/A CUSTOM (UP) 02-15-2010 14:41:41 Name ofPerson notify-host-by-email nagios
SERVER N/A CUSTOM (UP) 02-15-2010 14:41:40 Name ofPerson notify-service-by-email nagios
SERVER N/A CUSTOM (UP) 02-15-2010 14:34:27 Name ofPerson notify-host-by-email nagios
SERVER N/A CUSTOM (UP) 02-15-2010 14:34:27 Name ofPerson notify-service-by-email nagios

luca · February 16, 2010, 4:00pm

I don’t think it’s the problem, but why do you have a blank line in the hosttemplate definition after register?

Could you copy a host definition too?

njcwotx · February 16, 2010, 4:25pm

Im using nagiosql to generage configs, so that blank line is created there.

______________ /etc/nagios/nagiosql/hosts/SERVER.cfg ____________________

define host {
host_name SERVER
alias SERVER
address SERVER.domain.com
hostgroups Windows-Servers
use generic-host
register 1
}

njcwotx · February 16, 2010, 4:28pm

i edited the blank lines out manually and restarted nagios and will test the blanks. If that works, ill delete the templates entirely and redo them.

njcwotx · February 16, 2010, 4:40pm

edit of blank lines made no difference.

njcwotx · February 16, 2010, 4:48pm

Well I know flap detection works, I got events for that, I had to disable flap detection.

njcwotx · February 16, 2010, 5:01pm

ok, im now looking at sendmail. Checking maillog I can see maillog sends emails to correct host when I send a custom alert, but when the notifications go through, they go to localhost. Im investigating that end now.

luca · February 16, 2010, 5:07pm

if it doesn’t show up in the notifications page it doesn’t make much sense to look in sendmail…

njcwotx · February 16, 2010, 5:43pm

I agree if its not showing in notifications, thats a problem but my /var/log/maillog shows that emails from custom and command line go to my exchange server and when the emails come from the event handler they go to the localhost (aka, nowhere). I am trying to figure that out. It might be how the email is addressed.

njcwotx · February 16, 2010, 7:56pm

digging deeper into this…

OK, i switched from sendmail to ssmtp and got the same result. I talked to my exchange admin and discovered that:

The email IS being sent out automatically
The recipient is jacked up, its sending it automatically to "[email protected]" instead of "[email protected]"!
When I hit custom it goes to [email protected] and it shows up in my inbox.

judging from the recipeient I have to have a typo somewhere.

njcwotx · February 16, 2010, 8:08pm

Here is my check command… it looks ok, but it might be forest and trees at this point…

/usr/bin/printf “%b” “***** Nagios ***\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s " $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **” $CONTACTEMAIL$

luca · February 16, 2010, 9:53pm

looks good but just as a test if you wish here’s my notify-host-by-email (check the path to mail which is different)

/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$

njcwotx · February 16, 2010, 11:54pm

tried your host by notification and it is the same story…

I looked in history, it appears it actually worked as far back as the 9th. I must have jacked up part of the configuration somewhere in nagiosql. This is a new build and im getting the templates and logic all worked out so I must have jacked it up along the way. at least I now have direction.

I noticed in some of the test emails I get the $ in some places, I think that indicates that a variable is empty. Maybe the [email protected] is telling me I have some link missing and its not getting the right contact. At least that is my direction right now. I might just remove all the contact stuff and re do it.

PS, the nagiosql.org site has appeared down for a few days. Do you have any idea if that project is still available. The demo is up and I get a mysql error, so I assume its not completely down. I noticed this last week. I tried emailing the project creator but i got a returned email…

luca · February 17, 2010, 8:56am

no ideas as far as nagiosql is concerned. i tjhink i tested it some 3 or 4 years ago and trashed it after a couple of days because it ruined config files now and then, at least it did at the time, and it looks like something is still the same

What nagios version are you running? some macros changed in the jump between 2.x and 3.x… could it be you still have a 2.x core?

njcwotx · February 17, 2010, 5:26pm

it was a fresh 3.2.0 install.

It took me a while to get nagiosql figured out, but i have been making lots of changes lately.

Im not necessarily in love with nagiosql, but one of the requirements of the project is the ability to administer it over the web. I have the routine down now, in fact I was pretty close to turning it loose on monitoring before I realized no more alerts were coming out. If there are any good alertenative web gui managment tools out there that I can use, I can look at it on the testing box I will use.

luca · February 17, 2010, 5:28pm

sorry can’t help you there… i always returned to the command line interface

njcwotx · February 17, 2010, 5:46pm

I managed to get it working, I had to make a bunch of tweaks to the contact, contact group and template with a lot of overkill…now only to clean it up and see what breaks it. last night I figured the $ in the email address must have been telling me my $CONTACTEMAIL$ was jacked up so I went on an all out blitz to redo the contact stuff.

A few years ago I had nagios running well with some nice MRTG graphs but the boss didnt like he could not go to a web page and add/delete hosts or let helpdesk do some maintenance. I work with anti-commandline windowsy types. One day I came into work and he had given the host to somebody else for a project and didn’t say anything to me. I only had puzzlement when I went looking for the site We used solarwinds after that, it was a decent product but we outgrew it unless we wanted to pony up some bigger bucks. Now, they are willing to try this again, but with the stipulation I have to have a gui front end that the average Joe can maintain.

nagiosql is ok, it took a while to figure out a routine and logic for maintaining it. Its not that forgiving of mistakes though, no idiot proofing. I at least have it to where if a guy logs in and adds/deletes a host they dont have to put in anything but a server name and attach a template.

Once I have monitoring back in place, I was going to make a secondary box for testing new things and add ons.