Hi,
Ive got Nagios 1.2 monitoring a couple of services on ~ 200 servers - all using passive checks. This works fine. Checks are happening each 2 hours.
Now I need to check host availability for these servers more frequently and there are some more non-server sites that we need to monitor. If a site is down Nagios should send periodical notifications to a group of users.
So I thought that I’ll use active check_ping services for that
As a start I added check_ping service to one test host and brought it down.
I received host down notification - but only once! Tried restarting Nagios with cleaning MySQL tables and /usr/local/nagios/var - still no good.
Also I did not receive any service critical or unreachable notification for check_ping service itself.
“notification_interval” parameter for both host and service definitions is set to a non-zero value.
What can be a problem?
Below are my host, service and check commands definitions.
Thanks!
-=-=-=-=-=–=-==- hosts.cfg -=–=-=-=–===–=-=
define host {
name generic-host
event_handler_enabled 0
flap_detection_enabled 0
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
register 0
}
define host {
use generic-host
host_name cgbkhusrvray00
alias test server
address 10.10.1.200
check_command check-host-alive
max_check_attempts 3
notification_interval 30
notification_period 24x7
notification_options d,u,r
}
=-=-=-=-=-=-=-=-=- services.cfg -=-=-=-=-=-=-=-=-=-
define service {
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
register 0
}
define service {
name check_ping_tradelink
use generic-service
service_description CHECK_PING
is_volatile 0
check_period 24x7
max_check_attempts 1
normal_check_interval 2
retry_check_interval 1
contact_groups sun-admins
notification_interval 5
notification_period 24x7
notification_options c,w,r
passive_checks_enabled 0
active_checks_enabled 1
check_freshness 0
check_command check_ping
register 0
}
define service {
use check_ping_tradelink
host_name cgbkhusrvray00
register 1
}
=-=-=-=-=-==-=-=-=-= checkcommands.cfg -=-=-=–==–
define command {
command_name check_ping
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 1500,80% - c 3000,100% -p 1 -t 30
}
define command {
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 1000.0,80% -c 5000.0,100% -p 1 -t 30
}
=-=-=-=-=-=-=-===-==- host state info -=-=-=-==
Host Status: DOWN
Status Information: PING CRITICAL - Packet loss = 100%
Last Status Check: 13-10-2005 13:01:30
Status Data Age: 0d 0h 2m 1s
Last State Change: 12-10-2005 21:15:49
**Current State Duration: 0d 15h 47m 42s **
Last Host Notification: 12-10-2005 21:15:49
**Current Notification Number: 1 **
Is This Host Flapping? N/A
Percent State Change: N/A
In Scheduled Downtime? NO
Last Update: 13-10-2005 13:03:08
Host Checks: ENABLED
Host Notifications: ENABLED
Event Handler: DISABLED
Flap Detection: DISABLED
=-=-=-====-=-= service state info =-=-=-=-
Current Status: CRITICAL
Status Information: PING CRITICAL - Packet loss = 100%
Current Attempt: 1/1
State Type: HARD
Last Check Type: ACTIVE
Last Check Time: 13-10-2005 13:02:59
Status Data Age: 0d 0h 1m 47s
Next Scheduled Active Check: 13-10-2005 13:04:59
Latency: < 1 second
Check Duration: 10 seconds
Last State Change: 12-10-2005 21:15:51
**Current State Duration: 0d 15h 48m 55s **
Last Service Notification: N/A
**Current Notification Number: 0 **
Is This Service Flapping? N/A
Percent State Change: N/A
In Scheduled Downtime? NO
Last Update: 13-10-2005 13:04:39
Service Checks: ENABLED
Passive Checks: DISABLED
Service Notifications: ENABLED
Event Handler: ENABLED
Flap Detection: ENABLED
=-=-=-=-==- service state info for a passive check -=-===-=
Current Status: CRITICAL
Status Information: CRITICAL: Service results are stale!
Current Attempt: 1/1
State Type: HARD
Last Check Type: ACTIVE
Last Check Time: 13-10-2005 12:21:40
Status Data Age: 0d 0h 44m 12s
Next Scheduled Active Check: N/A
Latency: < 1 second
Check Duration: < 1 second
Last State Change: 12-10-2005 22:15:18
**Current State Duration: 0d 14h 50m 34s **
Last Service Notification: N/A
**Current Notification Number: 0 **
Is This Service Flapping? N/A
Percent State Change: N/A
In Scheduled Downtime? NO
Last Update: 13-10-2005 13:05:43
Service Checks: DISABLED
Passive Checks: ENABLED
Service Notifications: ENABLED
Event Handler: ENABLED
Flap Detection: ENABLED