Hey Tim - thanks for the reply - let me post my config files here and see if you see something hairy:
Host Config:
define host {
use noping-host
host_name host_name
alias alias
hostgroups groupname
address 172.xx.x.xxx
}
Then the 4 service definitions for that host (contained in the host.cfg file)
define service {
use generic-service
service_description check_dummy_ssh
host_name Host1, Host2, Host3
check_command !check_dummy_ssh!
}
define service {
use generic-service
service_description check_ssh_disk
host_name Host1, Host2, Host3
check_command !check_ssh_disk!
}
define service {
use generic-service
service_description check_ssh_swap
host_name Host1, Host2, Host3
check_command !check_ssh_swap!
}
define service {
use generic-service
service_description check_ssh_load
host_name Host1, Host2, Host3
check_command !check_ssh_load!
}
The noping and the generic template:
define host{
name noping-host ; The name of this host template
use generic-host ; This template inherits other values from the generic-host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 15 ; Actively check the host every 15 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 5 ; Check each Linux host 5 times (max)
check_command check_dummy_ssh
notification_interval 60 ; Resend notifications every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
define host{
name generic-host ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
contact_groups admins ; Notifications get sent to the admins by default
notification_period 24x7 ; Send host notifications at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
and the service template:
define service{
name generic-service ; The ‘name’ of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness’
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0 ; The service is not volatile
check_period 24x7 ; The service can be checked at any time of the day
max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state
normal_check_interval 15 ; Check the service every 10 minutes under normal conditions
retry_check_interval 1 ; Re-check the service every ONE minutes until a hard state can be determined
contact_groups admins, techops ; Notifications get sent out to everyone in the ‘admins’ group
notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 60 ; Re-notify about service problems every hour
notification_period 24x7 ; Notifications can be sent out at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
and finally the check commands:
‘check_ssh_swap’ command definition
define command{
command_name check_ssh_swap
command_line /usr/lib/nagios/plugins/check_by_ssh -H $HOSTADDRESS$ -l username-t 60 -C “/home/nagios/plugins/check_swap -a -w15% -c5%”
}
'check_ssh_load" command definition
define command{
command_name check_ssh_load
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -l username-t 30 -C “/home/nagios/plugins/check_load -r -w 85,85,85 -c 95,95,95”
}
‘ssh_disk’ command definition
define command{
command_name check_ssh_disk
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -l nagios -t 30 -C “/home/nagios/plugins/check_disk -w15% -c5% -l”
}
‘check_dummy_via_SSH’ command definition - returns what you send it - plugin resides on system being checked - if its not reachable it fails. This is a replacement for “Ping” when icmp is blocked.
define command{
command_name check_dummy_ssh
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -l nagios -t 30 -C “/home/nagios/plugins/check_dummy 0”
}
I know this is a lot - I hope I am not upsetting too many people by posting such a large amount of info here, but I figured it was pertinent.
Let me know if you see anything I don’t.