Host down notifications sent once only - why?


#1

Hi,

Ive got Nagios 1.2 monitoring a couple of services on ~ 200 servers - all using passive checks. This works fine. Checks are happening each 2 hours.

Now I need to check host availability for these servers more frequently and there are some more non-server sites that we need to monitor. If a site is down Nagios should send periodical notifications to a group of users.

So I thought that I’ll use active check_ping services for that

As a start I added check_ping service to one test host and brought it down.

I received host down notification - but only once! Tried restarting Nagios with cleaning MySQL tables and /usr/local/nagios/var - still no good.
Also I did not receive any service critical or unreachable notification for check_ping service itself.

“notification_interval” parameter for both host and service definitions is set to a non-zero value.

What can be a problem?

Below are my host, service and check commands definitions.

Thanks!

-=-=-=-=-=–=-==- hosts.cfg -=–=-=-=–===–=-=
define host {
name generic-host
event_handler_enabled 0
flap_detection_enabled 0
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
register 0
}
define host {
use generic-host
host_name cgbkhusrvray00
alias test server
address 10.10.1.200
check_command check-host-alive
max_check_attempts 3
notification_interval 30
notification_period 24x7
notification_options d,u,r
}

=-=-=-=-=-=-=-=-=- services.cfg -=-=-=-=-=-=-=-=-=-
define service {
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
register 0
}
define service {
name check_ping_tradelink
use generic-service
service_description CHECK_PING
is_volatile 0
check_period 24x7
max_check_attempts 1
normal_check_interval 2
retry_check_interval 1
contact_groups sun-admins
notification_interval 5
notification_period 24x7
notification_options c,w,r
passive_checks_enabled 0
active_checks_enabled 1
check_freshness 0
check_command check_ping
register 0
}

define service {
use check_ping_tradelink
host_name cgbkhusrvray00
register 1
}

=-=-=-=-=-==-=-=-=-= checkcommands.cfg -=-=-=–==–
define command {
command_name check_ping
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 1500,80% - c 3000,100% -p 1 -t 30
}
define command {
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 1000.0,80% -c 5000.0,100% -p 1 -t 30
}

=-=-=-=-=-=-=-===-==- host state info -=-=-=-==
Host Status: DOWN
Status Information: PING CRITICAL - Packet loss = 100%
Last Status Check: 13-10-2005 13:01:30
Status Data Age: 0d 0h 2m 1s
Last State Change: 12-10-2005 21:15:49
**Current State Duration: 0d 15h 47m 42s **
Last Host Notification: 12-10-2005 21:15:49
**Current Notification Number: 1 **
Is This Host Flapping? N/A
Percent State Change: N/A
In Scheduled Downtime? NO
Last Update: 13-10-2005 13:03:08

Host Checks: ENABLED
Host Notifications: ENABLED
Event Handler: DISABLED
Flap Detection: DISABLED

=-=-=-====-=-= service state info =-=-=-=-
Current Status: CRITICAL
Status Information: PING CRITICAL - Packet loss = 100%
Current Attempt: 1/1
State Type: HARD
Last Check Type: ACTIVE
Last Check Time: 13-10-2005 13:02:59
Status Data Age: 0d 0h 1m 47s
Next Scheduled Active Check: 13-10-2005 13:04:59
Latency: < 1 second
Check Duration: 10 seconds
Last State Change: 12-10-2005 21:15:51
**Current State Duration: 0d 15h 48m 55s **
Last Service Notification: N/A
**Current Notification Number: 0 **
Is This Service Flapping? N/A
Percent State Change: N/A
In Scheduled Downtime? NO
Last Update: 13-10-2005 13:04:39

Service Checks: ENABLED
Passive Checks: DISABLED
Service Notifications: ENABLED
Event Handler: ENABLED
Flap Detection: ENABLED

=-=-=-=-==- service state info for a passive check -=-===-=
Current Status: CRITICAL
Status Information: CRITICAL: Service results are stale!
Current Attempt: 1/1
State Type: HARD
Last Check Type: ACTIVE
Last Check Time: 13-10-2005 12:21:40
Status Data Age: 0d 0h 44m 12s
Next Scheduled Active Check: N/A
Latency: < 1 second
Check Duration: < 1 second
Last State Change: 12-10-2005 22:15:18
**Current State Duration: 0d 14h 50m 34s **
Last Service Notification: N/A
**Current Notification Number: 0 **
Is This Service Flapping? N/A
Percent State Change: N/A
In Scheduled Downtime? NO
Last Update: 13-10-2005 13:05:43

Service Checks: DISABLED
Passive Checks: ENABLED
Service Notifications: ENABLED
Event Handler: ENABLED
Flap Detection: ENABLED


#2

are you possibly using escalations somewhere? maybe you defined a first_notification last_notification value somewhere in escalations contacts or contactgroups… not sure why it should get through only once…

Is nagios generating only one? check on the notifications page…

Luca


#3

no

yes


#4

try adding an http check or something non dependant on ping which you can stop. as to see if service notifications get sent multiple times or not. Using ping as a service check isn’t a good example.

Did you check for the first/last_notification part?

I can’t see errors in those config files… at least not yet…

Luca