How Do I Stop the Host Ping Check?

I apparently have either a misunderstanding of the setup (very likely) or a misconfiguration in my system (also almost certain…). I have since the beginning of turning up this system used a ping check on every machine. To achieve this I put the “check_host_alive” command in the configuration template for “prod-host”. I then point all of my hosts to this template and they all ping. I have them all in many different host groups in separate host cfg files for organizational reasons. It is a large implementation - I have well over 700 servers running more than 4000 processes. It all works quite nicely.

However I just got tasked with monitoring a new “stealth” piece of equipment on our network that does not accept icmp. I can connect via passwordless ssh and I planned on simply replacing the check_host_alive command by creating a new “noping-host” template. I just took out the check_host alive command in this template. I created 4 service checks for these machines using the check_by _ssh commands and they all work, but I still get the ping! I need to get rid of the ping. I have restarted Nagios a number of times. Its not in the template chain, and I cannot figure out where it is coming from. Can anyone help? These are brand new machines - they aren’t a re-used IP with a duplicate out there or anything similar. It is a little vexing.

Thanks

Joe

What are you using for the check_command parameter in the noping-host template? It wasn’t clear. Are you sure about the order of precedence in the template chain?

Hey Tim - thanks for the reply - let me post my config files here and see if you see something hairy:

Host Config:
define host {
use noping-host
host_name host_name
alias alias
hostgroups groupname
address 172.xx.x.xxx
}

Then the 4 service definitions for that host (contained in the host.cfg file)

define service {
use generic-service
service_description check_dummy_ssh
host_name Host1, Host2, Host3
check_command !check_dummy_ssh!
}

define service {
use generic-service
service_description check_ssh_disk
host_name Host1, Host2, Host3
check_command !check_ssh_disk!
}

define service {
use generic-service
service_description check_ssh_swap
host_name Host1, Host2, Host3
check_command !check_ssh_swap!
}

define service {
use generic-service
service_description check_ssh_load
host_name Host1, Host2, Host3
check_command !check_ssh_load!
}

The noping and the generic template:

define host{
name noping-host ; The name of this host template
use generic-host ; This template inherits other values from the generic-host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 15 ; Actively check the host every 15 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 5 ; Check each Linux host 5 times (max)
check_command check_dummy_ssh
notification_interval 60 ; Resend notifications every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}

define host{
name generic-host ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
contact_groups admins ; Notifications get sent to the admins by default
notification_period 24x7 ; Send host notifications at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}

and the service template:

define service{
name generic-service ; The ‘name’ of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness’
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0 ; The service is not volatile
check_period 24x7 ; The service can be checked at any time of the day
max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state
normal_check_interval 15 ; Check the service every 10 minutes under normal conditions
retry_check_interval 1 ; Re-check the service every ONE minutes until a hard state can be determined
contact_groups admins, techops ; Notifications get sent out to everyone in the ‘admins’ group
notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 60 ; Re-notify about service problems every hour
notification_period 24x7 ; Notifications can be sent out at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}

and finally the check commands:

‘check_ssh_swap’ command definition

define command{
command_name check_ssh_swap
command_line /usr/lib/nagios/plugins/check_by_ssh -H $HOSTADDRESS$ -l username-t 60 -C “/home/nagios/plugins/check_swap -a -w15% -c5%”
}

'check_ssh_load" command definition

define command{
command_name check_ssh_load
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -l username-t 30 -C “/home/nagios/plugins/check_load -r -w 85,85,85 -c 95,95,95”
}

‘ssh_disk’ command definition

define command{
command_name check_ssh_disk
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -l nagios -t 30 -C “/home/nagios/plugins/check_disk -w15% -c5% -l”
}

‘check_dummy_via_SSH’ command definition - returns what you send it - plugin resides on system being checked - if its not reachable it fails. This is a replacement for “Ping” when icmp is blocked.

define command{
command_name check_dummy_ssh
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -l nagios -t 30 -C “/home/nagios/plugins/check_dummy 0”
}

I know this is a lot - I hope I am not upsetting too many people by posting such a large amount of info here, but I figured it was pertinent.

Let me know if you see anything I don’t.

Your config looks good to me. Most of the problems I’ve seen like this have related to inheritance issues but everything here looks good.
Did you at one point use the ping check to see if the host is alive for your new stealth device? I’ve seen where old definitions have been stuck and need to be cleared out - I ran into the same thing trying to convert a couple boxes in my DMZ to use RDP port access for a host check instead of ping. If memory services I temporarily disabled the retention of state data and it cleared up. My guess is that deleting the state retention file and restarting Nagios would do the trick as well.