How Do I Stop the Host Ping Check?

joe1871 · July 21, 2011, 12:22am

I apparently have either a misunderstanding of the setup (very likely) or a misconfiguration in my system (also almost certain…). I have since the beginning of turning up this system used a ping check on every machine. To achieve this I put the “check_host_alive” command in the configuration template for “prod-host”. I then point all of my hosts to this template and they all ping. I have them all in many different host groups in separate host cfg files for organizational reasons. It is a large implementation - I have well over 700 servers running more than 4000 processes. It all works quite nicely.

However I just got tasked with monitoring a new “stealth” piece of equipment on our network that does not accept icmp. I can connect via passwordless ssh and I planned on simply replacing the check_host_alive command by creating a new “noping-host” template. I just took out the check_host alive command in this template. I created 4 service checks for these machines using the check_by _ssh commands and they all work, but I still get the ping! I need to get rid of the ping. I have restarted Nagios a number of times. Its not in the template chain, and I cannot figure out where it is coming from. Can anyone help? These are brand new machines - they aren’t a re-used IP with a duplicate out there or anything similar. It is a little vexing.

Thanks

Joe

timbCFCA · July 21, 2011, 7:33pm

What are you using for the check_command parameter in the noping-host template? It wasn’t clear. Are you sure about the order of precedence in the template chain?

joe1871 · July 21, 2011, 9:10pm

Hey Tim - thanks for the reply - let me post my config files here and see if you see something hairy:

Host Config:
define host {
use noping-host
host_name host_name
alias alias
hostgroups groupname
address 172.xx.x.xxx
}

Then the 4 service definitions for that host (contained in the host.cfg file)

define service {
use generic-service
service_description check_dummy_ssh
host_name Host1, Host2, Host3
check_command !check_dummy_ssh!
}

define service {
use generic-service
service_description check_ssh_disk
host_name Host1, Host2, Host3
check_command !check_ssh_disk!
}

define service {
use generic-service
service_description check_ssh_swap
host_name Host1, Host2, Host3
check_command !check_ssh_swap!
}

define service {
use generic-service
service_description check_ssh_load
host_name Host1, Host2, Host3
check_command !check_ssh_load!
}

The noping and the generic template:

define host{
name noping-host ; The name of this host template
use generic-host ; This template inherits other values from the generic-host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 15 ; Actively check the host every 15 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 5 ; Check each Linux host 5 times (max)
check_command check_dummy_ssh
notification_interval 60 ; Resend notifications every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}

define host{
name generic-host ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
contact_groups admins ; Notifications get sent to the admins by default
notification_period 24x7 ; Send host notifications at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}

and the service template:

define service{
name active_checks_enabled passive_checks_enabled parallelize_check obsess_over_service check_freshness notifications_enabled event_handler_enabled flap_detection_enabled failure_prediction_enabled process_perf_data retain_status_information retain_nonstatus_information is_volatile check_period max_check_attempts normal_check_interval retry_check_interval contact_groups notification_options w,u,c,r ; notification_interval notification_period register } generic-service ; The ‘name’ of this service template
1 ; Active service checks are enabled
1 ; Passive service checks are enabled/accepted
1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
1 ; We should obsess over this service (if necessary)
0 ; Default is to NOT check service 'freshness’
1 ; Service notifications are enabled
1 ; Service event handler is enabled
1 ; Flap detection is enabled
1 ; Failure prediction is enabled
1 ; Process performance data
1 ; Retain status information across program restarts
1 ; Retain non-status information across program restarts
0 ; The service is not volatile
24x7 ; The service can be checked at any time of the day
3 ; Re-check the service up to 3 times in order to determine its final (hard) state
15 ; Check the service every 10 minutes under normal conditions
1 ; Re-check the service every ONE minutes until a hard state can be determined
admins, techops ; Notifications get sent out to everyone in the ‘admins’ group
Send notifications about warning, unknown, critical, and recovery events
60 ; Re-notify about service problems every hour
24x7 ; Notifications can be sent out at any time
0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!

and finally the check commands:

‘check_ssh_swap’ command definition

define command{
command_name check_ssh_swap
command_line /usr/lib/nagios/plugins/check_by_ssh -H $HOSTADDRESS$ -l username-t 60 -C “/home/nagios/plugins/check_swap -a -w15% -c5%”
}

'check_ssh_load" command definition

define command{
command_name check_ssh_load
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -l username-t 30 -C “/home/nagios/plugins/check_load -r -w 85,85,85 -c 95,95,95”
}

‘ssh_disk’ command definition

define command{
command_name check_ssh_disk
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -l nagios -t 30 -C “/home/nagios/plugins/check_disk -w15% -c5% -l”
}

‘check_dummy_via_SSH’ command definition - returns what you send it - plugin resides on system being checked - if its not reachable it fails. This is a replacement for “Ping” when icmp is blocked.

define command{
command_name check_dummy_ssh
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -l nagios -t 30 -C “/home/nagios/plugins/check_dummy 0”
}

I know this is a lot - I hope I am not upsetting too many people by posting such a large amount of info here, but I figured it was pertinent.

Let me know if you see anything I don’t.

timbCFCA · July 22, 2011, 2:07pm

Your config looks good to me. Most of the problems I’ve seen like this have related to inheritance issues but everything here looks good.
Did you at one point use the ping check to see if the host is alive for your new stealth device? I’ve seen where old definitions have been stuck and need to be cleared out - I ran into the same thing trying to convert a couple boxes in my DMZ to use RDP port access for a host check instead of ping. If memory services I temporarily disabled the retention of state data and it cleared up. My guess is that deleting the state retention file and restarting Nagios would do the trick as well.