Passive serivice with freshness


#1

Hi all, new to the forum and software.
I have active and passive service checks for all hosts. The passive serivce is used to receive input from legacy monitors that alert only on failure. I have modified these to send a 0 status after the failure to clear the gui of warning or failures. We get email notification and paging for warning and critical so the gui is not that important to the SAs. However the bean counters want the operators to sit in front of the plasma screen looking at the colors. We want the 85 host disabled under Active Checks to go way. I followed the instruction in the FAQ and can not seem to get freshness to work let alone the “hosts disabled” to got away. This may be do to the fact we have both active and passive for all clients… or I just don’t know what I am doing :cry:
Here is what I have

Nagios 2.0b3 running on redhat 8 E3
NSCA - Nagios Service Check Acceptor
Copyright © 2000-2003 Ethan Galstad ([email protected])
Version: 2.4
Clients are AIX with nsca_send installed and functional

accept_passive_host_checks=1
accept_passive_service_checks=1
admin_email=nagios
admin_pager=pagenagios
aggregate_status_updates=1
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
cfg_dir=/usr/local/nagios/etc/network
cfg_dir=/usr/local/nagios/etc/san
cfg_dir=/usr/local/nagios/etc/unix
cfg_dir=/usr/local/nagios/etc/windows
cfg_file=/usr/local/nagios/etc/checkcommands.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/misccommands.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
check_external_commands=1
check_for_orphaned_services=0
check_host_freshness=0
check_service_freshness=1
command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
comment_file=/usr/local/nagios/var/comments.dat
daemon_dumps_core=0
date_format=us
downtime_file=/usr/local/nagios/var/downtime.dat
enable_event_handlers=1
enable_flap_detection=0
enable_notifications=1
event_broker_options=-1
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=1
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
host_check_timeout=30
host_freshness_check_interval=60
host_inter_check_delay_method=s
illegal_macro_output_chars=~$:cry:'"<> illegal_object_name_chars=~!$%^&*|’"<>?,()=
interval_length=60
lock_file=/usr/local/nagios/var/nagios.lock
log_archive_path=/usr/local/nagios/var/archives
log_event_handlers=1
log_external_commands=1
log_file=/usr/local/nagios/var/nagios.log
log_host_retries=1
log_initial_states=0
log_notifications=1
log_passive_checks=1
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
max_concurrent_checks=0
max_host_check_spread=30
max_service_check_spread=30
nagios_group=nagios
nagios_user=nagios
notification_timeout=30
object_cache_file=/usr/local/nagios/var/objects.cache
obsess_over_services=1
ocsp_timeout=5
p1_file=/usr/local/nagios/bin/p1.pl
perfdata_timeout=5
process_performance_data=0
resource_file=/usr/local/nagios/etc/resource.cfg
retain_state_information=1
retention_update_interval=60
service_check_timeout=60
service_freshness_check_interval=60
service_inter_check_delay_method=s
service_interleave_factor=s
service_reaper_frequency=10
sleep_time=0.25
state_retention_file=/usr/local/nagios/var/retention.dat
status_file=/usr/local/nagios/var/status.dat
status_update_interval=15
temp_file=/usr/local/nagios/var/nagios.tmp
use_aggressive_host_checking=0
use_regexp_matching=0
use_retained_program_state=1
use_retained_scheduling_info=0
use_syslog=1
use_true_regexp_matching=0

############################################

service.cfg
define service{
use unix-generic-service ; Name of service template to use
hostgroup_name AIX-PROD
service_description PING
is_volatile 0
check_period 24x7
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
contact_groups unix_admins
notification_interval 960
notification_period 24x7
check_command check_ping!100.0,20%!500.0,60%
}

define service{
use unix-generic-service ; Name of service template to use
hostgroup_name AIX-NONP
service_description PING
is_volatile 0
check_period 24x7
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
contact_groups unix_admins_nonp
notification_interval 960
notification_period 24x7
check_command check_ping!100.0,20%!500.0,60%
}

define service{
use unix-generic-service ; Name of service template to use
host_name cmhsawlpnv01
service_description Root Partition
is_volatile 0
check_period 24x7
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
contact_groups unix_admins
notification_interval 960
notification_period 24x7
check_command check_local_disk!20%!10%!/
}

define service{
use unix-generic-service ; Name of service template to use
host_name cmhsawlpnv01
service_description Current Users
is_volatile 0
check_period 24x7
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
contact_groups unix_admins
notification_interval 960
notification_period 24x7
check_command check_local_users!20!50
}

define service{
use unix-generic-service ; Name of service template to use
host_name cmhsawlpnv01
service_description Total Processes
is_volatile 0
check_period 24x7
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
contact_groups unix_admins
notification_interval 960
notification_period 24x7
check_command check_local_procs!250!400
}

define service{
use unix-generic-service ; Name of service template to use
host_name cmhsawlpnv01
service_description Current Load
is_volatile 0
check_period 24x7
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
contact_groups unix_admins
notification_interval 960
notification_period 24x7
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}

define service{
use unix-generic-service ; Name of service template to use

    hostgroup_name                  AIX-PROD
    service_description             AIX-passive
    is_volatile                     1
    active_checks_enabled           0
    passive_checks_enabled          1
    check_period                    never
    check_freshness                 1
    freshness_threshold             50
    max_check_attempts              1
    normal_check_interval           1
    retry_check_interval            1
    contact_groups                  unix_admins
    notification_interval           960
    notification_period             24x7
    notification_options            w,u,c
    check_command                   passive_service
    }

define service{
use unix-generic-service ; Name of service template to use

    hostgroup_name                  AIX-NONP
    service_description             AIX-passive
    is_volatile                     1
    active_checks_enabled           0
    passive_checks_enabled          1
    check_period                    never
    check_freshness                 1
    freshness_threshold             50
    max_check_attempts              1
    normal_check_interval           1
    retry_check_interval            1
    contact_groups                  unix_admins
    notification_interval           960
    notification_period             24x7
    notification_options            w,u,c
    check_command                   passive_service
    }

passive_service is defined and echos a statement and returns 0

If you need anything else just say the word.
I see nothing in the log that says anything about a freshness check… not sure if that is logged or not and I have one host that I set to warning and it has never been refreshed with the passive_service script…
Thanks in advance
Doug

Edited ]


#2

Please be more clear. I’m going to assume you are talking about the colors in the “tactical display”. Why you are having them look at that page, I dont know. But I was counting the beans, they would be looking at the “service problems” display. If something pops into that screen, then there is a problem.

Freshnesh checking is quite simple.
Enable freshness checking for a service for 10minutes(or so) if your passive service checks every 5 minutes. The active check you define is “service_is_stale” in the services.cfg.
The script is "/usr/local/nagios/libexec/staleservice.sh"
This script will do nothing but generate a “service is stale” output and service will be in critical state.

If you want the 85 hosts in the tactical display to go away, then rewrite the cgi. I personally don’t see the need, since these checks are in fact, disabled (not acitve) and get there data from passive checks. I know what you would like to see, but it’s not there. Code change is all I can see.

But seriously, take that tactical link off nagios (to make the boss happy), and direct them to look at the Service Problems. Save yourself some trouble that way.


#3

Thanks! The other issue I had was freshness was not working… I figured that one out… in the FAQ they say to
Define check_period none for this service.
I had done that and there for freshness was not working. Once I set that back to 24x7 freshness worked … still red on tac screen but I can put some black tape over that section of the screen and they won’t see it :?
Thanks for the help
Doug
Edited Thu Aug 25 2005, 10:16AM ]


#4

edit “side.html” in the share dir.
Rem out the html starting about line 80:
tr
td width=13 img src=“images/greendot.gif” width=“13” height=“14” name=“tac-dot” /td
td nowrap width=134 a href="/nagios/cgi-bin/tac.cgi" target=“main” onMouseOver=“switchdot(‘tac-dot’,1)” onMouseOut=“switchdot(‘tac-dot’,0)” class=“NavBarItem”>Tactical Overview /a /td
/tr

Anyway, you get the idea, out of site, out of mind.