Hi. We hacked host status check a bit (check-host-alive) and changed it with custom script for needs of Nagios BPI.
Now we noticed that whenever host becomes CRITICAL (DOWN) service notifications are suppressed. We need somehow to avoid this suppression.
Is there any standard way to do that?
is d selected in the hosts notification_options?
Is it possible to disable such suppression shedding hardly any blood
define host {
host_name localhost
alias localhost
address 127.7.7.7
check_period 24x7
check_command check_service_my_new!$TOTALHOSTSERVICESOK$!$TOTALHOSTSERVICESWARNING$!$TOTALHOSTSERVICESCRITICAL$!$TOTALHOSTSERVICESUNKNOWN$
contact_groups admins
notification_period workhours
initial_state o
check_interval 5.000000
retry_interval 1.000000
max_check_attempts 10
active_checks_enabled 1
passive_checks_enabled 1
obsess_over_host 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options o,d,u
freshness_threshold 0
check_freshness 0
notification_options d,u,r
notifications_enabled 1
notification_interval 10.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
failure_prediction_enabled 1
notes_url http://10.0.101.10/nagios/cgi-bin/status.cgi?hostgroup=$HOSTNAME$&style=detail
action_url /nagios/cgi-bin/status.cgi?hostgroup=$HOSTNAME$&style=detail
retain_status_information 1
retain_nonstatus_information 1
}
The host looks ok…
has check_service_my_new the c option set in notifcation_options?
Yepp. Everything is OK and service notifications get delivered if host status command returns OK status. But when host is in DOWN state (in reality host is up and running) services notifications appears to be suppressed.
Do you understand the case?
I think i understood what has been described…
in the service definition do you have the c option selected in the notification_options?
It looks ike you have some configuration problem somewhere. Some notifications do get sent so you don’t have a generci config problem.
define service {
host_name localhost
service_description SSH
check_period 24x7
check_command check_ssh
contact_groups admins
notification_period 24x7
initial_state o
check_interval 5.000000
retry_interval 1.000000
max_check_attempts 4
is_volatile 0
parallelize_check 1
active_checks_enabled 1
passive_checks_enabled 1
obsess_over_service 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options o,w,u,c
freshness_threshold 0
check_freshness 0
notification_options u,w,c,r
notifications_enabled 1
notification_interval 10.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
failure_prediction_enabled 1
retain_status_information 1
retain_nonstatus_information 1
_DUMMYVAR 123
}
check_service_my_new is the one we were talking about
Do you see the critical notifications in the notifications page?
Wait-wait-wait-wait
check_service_my_new - is a script. It’s check command.
I showed you on of services which I’m working on.
When check_service_my_new returns UP state of the host, service SSH notification get delivered and displayed in the notifications page.
But when check_service_my_new returns DOWN state killing SSH daemon makes nothing more than displayed CRITICAL after SSH in the notifications page. Notification never get delivered about SSH service. The only notification being sent is notification about host state is DOWN.
By default when services are up NO checks are done on the host check.
when a service is down nagios checks if the host is up by doing the check-host-alive check.
If a host is DOWN no notifications should be sent for the services on that host. If a host is down how could a service be running?
You want the notifications for the single services to be sent if the host is down? I think i just understood your request now…
Not sure it can be done… i think it could… BUT anyway you could use an eventhandler script that sends out “fake” emails for the services when the host goes down.
As I mentioned before, in our case when host is in DOWN state it is actually bot down. Lets say ICMP ping becomes filtered somewhere on the network between nagios server and monitored host. Then host status turns DOWN, but services are working as expected.