No longer recieving service notifications in Nagios 2.0b4


#1

Hi all

I’m using Nagios 2.0b4 and I’ve had service notifications working, but I stopped getting them several weeks ago and I’m not sure why. I moved from a monolithic config file to several template based configuration files around the same time so maybe something got mixed up in the transmission. Maybe something else went wrong, but either way I’m hoping fresh eyes will help find the problem.

When watching the logs and turning off a sample service, I can see the service checks fail 3 times as per max_check_attempts, but no notifications are logged and none get sent out, though the Service Details page within the web interface shows the failed service as critical.

A sample check is included here:

define service{
use generic-service ; Name of service
host_name xxx
service_description HTTP
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups test
notification_interval 960
notification_period 24x7
check_command check_http
}

The contact group is:

define contactgroup{
contactgroup_name test
alias Nagios test
members adam
}

The contact adam is specified as:

define contact{
contact_name adam
alias Adam
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,r
service_notification_commands notify-by-email,notify-by-gnokii
host_notification_commands host-notify-by-email,host-notify-by-gnokii
email adam@domain.co.uk
pager 0123456789
}

The service notification commands are (apologies for any linewrapping added, all commands are on one line in the cfg):

define command{
command_name notify-by-email
command_line /usr/bin/printf “%b” “***** Nagios ***\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /bin/mail -s "
$NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **” $CONTACTEMAIL$
}

define command{
command_name notify-by-gnokii
command_line /bin/echo “Nagios Service $NOTIFICATIONTYPE$: $SERVICEDESC$ on $HOSTALIAS$ $HOSTADDRESS$ is $SERVICESTATE$ $SHORTDATETIME$” | /usr/local/bin/gnokii --sendsms $CONTACTPAGER$
}

The host is specified as:

define host{
use generic-host ; Name of host template to use
host_name xxx
alias XXX
address 192.168.10.80
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,r
contact_groups test
}

I am confused. Host notifications still work fine. Host and service notifications are turned on in nagios.cfg and I have read the notifications FAQ, but maybe I missed something.

Can anyone see where I have gone wrong? I would be very grateful.

Thanks,

Adam


#2

You stated you turned off a service but that you still get host notifications just fine. Just in case you actually downed the host also, you might want to know, that if a service check fails, it checks the host. If the host fails, you will be notified about the HOST being down, but NOT the service. Obviously the service will be down if there is no host.


#3

Thanks for the reply. The host itself is fine, I have just turned off the service, ie /etc/init.d/httpd stop

In the web interface, the host is ok, the ping is ok, but the service (httpd for example) is critical.


#4

Ahh, OK. This is the problem.
Click on the link for the service and then click the link called “Enable notifications for this service”.
To explain why this is needed, read this.
nagios.sourceforge.net/docs/1_0/ … tion_notes


#5

Hmm, I’m not sure if that is the problem, firstly the link to enable notifications was not there, I had a Disable Notifications link instead, as can be seen from the attached screenshot. Also I turned off state retention altogether and restarted Nagios. Nagios restarted knowing nothing about the host and services states but again the sample service was found to be critical 3 times (as per max_check_attempts) but no notifications were sent out.

Sample log output:

[1126204879] Nagios 2.0b4 starting… (PID=2929)
[1126204879] LOG VERSION: 2.0
[1126204879] Finished daemonizing… (New PID=2930)
[1126205029] SERVICE ALERT: xxx;HTTP;CRITICAL;SOFT;1;Connection refused
[1126205089] SERVICE ALERT: xxx;HTTP;CRITICAL;SOFT;2;Connection refused
[1126205149] SERVICE ALERT: xxx;HTTP;CRITICAL;HARD;3;Connection refused

After 3 attempts, notifications should be sent out. I have turned state retention back on and restarted Nagios again:

[1126213315] SERVICE ALERT: xxx;HTTP;OK;HARD;3;HTTP OK HTTP/1.1 200 OK - 349 bytes in 0.005 seconds

I’m grateful for your time and your help though.

(Incidentally, the hostname has nothing to do with what its name might imply ;))



#6

Blush, didn’t know the whole image would be displayed in the post…


#7

Incidentally, I notice that in my nagios.cfg, enable_flap_detection=0 but in the screenshot above, flap detection is enabled, as it also appears to be in var/objects.cache for all hosts and services.

Could this affect the problem I am having?


#8

[quote=“drink76”]Incidentally, I notice that in my nagios.cfg, enable_flap_detection=0 but in the screenshot above, flap detection is enabled, as it also appears to be in var/objects.cache for all hosts and services.

Could this affect the problem I am having?[/quote]

I know you think I’m off my rocker, but again, please read nagios.sourceforge.net/docs/1_0/ … tion_notes

You can turn off stuff in the nagios.cfg file and it won’t make a bit of difference unless you do as instructed. Enable/disable flap detection by using the web interface and do the same for notifications, i.e. disable/enable notifications. It only takes a couple of cliks and you might prove me wrong but still would make an old man happy.


#9

I tried this as you suggested, but it didn’t solve the problem.

I came back to work this morning and looked around parts of the web interface that I hadn’t investigated. I noticed in the View Config page, that the Notification Options were empty. Very weird, so I drpped some options into services.cfg for my sample service and all of a sudden I start getting Service Notifications.

I have no idea at which point these Notification Options got lost, maybe in the move from using minimal.cfg to using template based config. I can only assume that at some point, service notification options were specified somewhere and inherited by all checks.

I apologise if I appear to have wasted your time, but really I have been looking at this for maybe 2 or 3 weeks before I stumbled across this this morning.

Thanks for your time :slight_smile:


#10

no problem. This is simply another example of why it’s so important to look at each of your configs in detail over and over. I’ve installed nagios many times, and each time, I follow the docs step by step, and make sure that each .cfg is configed per the docs.