Host down notifications sent once only - why?

vint_au · October 13, 2005, 3:14am

Hi,

Ive got Nagios 1.2 monitoring a couple of services on ~ 200 servers - all using passive checks. This works fine. Checks are happening each 2 hours.

Now I need to check host availability for these servers more frequently and there are some more non-server sites that we need to monitor. If a site is down Nagios should send periodical notifications to a group of users.

So I thought that I’ll use active check_ping services for that

As a start I added check_ping service to one test host and brought it down.

I received host down notification - but only once! Tried restarting Nagios with cleaning MySQL tables and /usr/local/nagios/var - still no good.
Also I did not receive any service critical or unreachable notification for check_ping service itself.

“notification_interval” parameter for both host and service definitions is set to a non-zero value.

What can be a problem?

Below are my host, service and check commands definitions.

Thanks!

-=-=-=-=-=–=-==- hosts.cfg -=–=-=-=–===–=-=
define host {
name generic-host
event_handler_enabled 0
flap_detection_enabled 0
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
register 0
}
define host {
use generic-host
host_name cgbkhusrvray00
alias test server
address 10.10.1.200
check_command check-host-alive
max_check_attempts 3
notification_interval 30
notification_period 24x7
notification_options d,u,r
}

=-=-=-=-=-=-=-=-=- services.cfg -=-=-=-=-=-=-=-=-=-
define service {
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
register 0
}
define service {
name check_ping_tradelink
use generic-service
service_description CHECK_PING
is_volatile 0
check_period 24x7
max_check_attempts 1
normal_check_interval 2
retry_check_interval 1
contact_groups sun-admins
notification_interval 5
notification_period 24x7
notification_options c,w,r
passive_checks_enabled 0
active_checks_enabled 1
check_freshness 0
check_command check_ping
register 0
}

define service {
use check_ping_tradelink
host_name cgbkhusrvray00
register 1
}

=-=-=-=-=-==-=-=-=-= checkcommands.cfg -=-=-=–==–
define command {
command_name check_ping
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 1500,80% - c 3000,100% -p 1 -t 30
}
define command {
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 1000.0,80% -c 5000.0,100% -p 1 -t 30
}

=-=-=-=-=-=-=-===-==- host state info -=-=-=-==
Host Status: DOWN
Status Information: PING CRITICAL - Packet loss = 100%
Last Status Check: 13-10-2005 13:01:30
Status Data Age: 0d 0h 2m 1s
Last State Change: 12-10-2005 21:15:49
**Current State Duration: 0d 15h 47m 42s **
Last Host Notification: 12-10-2005 21:15:49
**Current Notification Number: 1 **
Is This Host Flapping? N/A
Percent State Change: N/A
In Scheduled Downtime? NO
Last Update: 13-10-2005 13:03:08

Host Checks: ENABLED
Host Notifications: ENABLED
Event Handler: DISABLED
Flap Detection: DISABLED

=-=-=-====-=-= service state info =-=-=-=-
Current Status: CRITICAL
Status Information: PING CRITICAL - Packet loss = 100%
Current Attempt: 1/1
State Type: HARD
Last Check Type: ACTIVE
Last Check Time: 13-10-2005 13:02:59
Status Data Age: 0d 0h 1m 47s
Next Scheduled Active Check: 13-10-2005 13:04:59
Latency: < 1 second
Check Duration: 10 seconds
Last State Change: 12-10-2005 21:15:51
**Current State Duration: 0d 15h 48m 55s **
Last Service Notification: N/A
**Current Notification Number: 0 **
Is This Service Flapping? N/A
Percent State Change: N/A
In Scheduled Downtime? NO
Last Update: 13-10-2005 13:04:39

Service Checks: ENABLED
Passive Checks: DISABLED
Service Notifications: ENABLED
Event Handler: ENABLED
Flap Detection: ENABLED

=-=-=-=-==- service state info for a passive check -=-===-=
Current Status: CRITICAL
Status Information: CRITICAL: Service results are stale!
Current Attempt: 1/1
State Type: HARD
Last Check Type: ACTIVE
Last Check Time: 13-10-2005 12:21:40
Status Data Age: 0d 0h 44m 12s
Next Scheduled Active Check: N/A
Latency: < 1 second
Check Duration: < 1 second
Last State Change: 12-10-2005 22:15:18
**Current State Duration: 0d 14h 50m 34s **
Last Service Notification: N/A
**Current Notification Number: 0 **
Is This Service Flapping? N/A
Percent State Change: N/A
In Scheduled Downtime? NO
Last Update: 13-10-2005 13:05:43

Service Checks: DISABLED
Passive Checks: ENABLED
Service Notifications: ENABLED
Event Handler: ENABLED
Flap Detection: ENABLED

luca · October 13, 2005, 10:01am

are you possibly using escalations somewhere? maybe you defined a first_notification last_notification value somewhere in escalations contacts or contactgroups… not sure why it should get through only once…

Is nagios generating only one? check on the notifications page…

Luca

vint_au · October 13, 2005, 11:49am

no

yes

luca · October 14, 2005, 8:22am

try adding an http check or something non dependant on ping which you can stop. as to see if service notifications get sent multiple times or not. Using ping as a service check isn’t a good example.

Did you check for the first/last_notification part?

I can’t see errors in those config files… at least not yet…

Luca