I installed Nagios 2.5 on a x86 running Solaris 10. What I’ve noticed is that the next day most of my services are not being checked. I seems that nagios is going stale. I checked the logs and I didn’t see any error messages, unless I am not logging it properly. After I restart nagios all my services starts getting refresehed.
I have heard of some service checks can become orphaned, but never heard of many. To help fix your problem enable orphan checking. Check the nagios docs and look at tyour nagios.cfg on how to do this.
Thanks I did see a discussion about this in the Nagios-Forum.co.uk that you posted.
I made the change, but I still getting a lot of stale services. I did see a lot of orphaned messages in the log.
Below is my nagios.cfg:
log_file=/var/nagios/nagios.log
cfg_file=/apps/nagios/2.5/etc/checkcommands.cfg
cfg_file=/apps/nagios/2.5/etc/misccommands.cfg
cfg_file=/apps/nagios/2.5/etc/contactgroups.cfg
cfg_file=/apps/nagios/2.5/etc/contacts.cfg
cfg_file=/apps/nagios/2.5/etc/dependencies.cfg
cfg_file=/apps/nagios/2.5/etc/escalations.cfg
cfg_file=/apps/nagios/2.5/etc/hostgroups.cfg
cfg_file=/apps/nagios/2.5/etc/hosts.cfg
cfg_file=/apps/nagios/2.5/etc/services.cfg
cfg_file=/apps/nagios/2.5/etc/timeperiods.cfg
cfg_file=/apps/nagios/2.5/etc/serviceextinfo.cfg
cfg_dir=/apps/nagios/2.5/etc/servers
object_cache_file=/var/nagios/objects.cache
resource_file=/apps/nagios/2.5/etc/resource.cfg
status_file=/var/nagios/status.dat
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=15s
command_file=/apps/nagios/2.5/var/rw/nagios.cmd
comment_file=/var/nagios/comments.dat
downtime_file=/var/nagios/downtime.dat
lock_file=/var/nagios/nagios.lock
temp_file=/var/nagios/nagios.tmp
event_broker_options=-1
log_rotation_method=d
log_archive_path=/var/nagios/archives
use_syslog=0
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=1
log_external_commands=1
log_passive_checks=1
service_inter_check_delay_method=s
max_service_check_spread=30
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0
service_reaper_frequency=10
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
sleep_time=0.25
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
retain_state_information=1
state_retention_file=/var/nagios/retention.dat
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=1
interval_length=60
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_notifications=1
enable_event_handlers=1
process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
host_perfdata_file=/var/nagios/host-perfdata
service_perfdata_file=/var/nagios/service-perfdata
host_perfdata_file_template=[HOSTPERFDATA]\t$TIMET$\t$HOSTNAME$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$
service_perfdata_file_template=[SERVICEPERFDATA]\t$TIMET$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$
host_perfdata_file_mode=a
service_perfdata_file_mode=a
host_perfdata_file_processing_interval=0
service_perfdata_file_processing_interval=0
host_perfdata_file_processing_command=process-host-perfdata-file
service_perfdata_file_processing_command=process-service-perfdata-file
obsess_over_services=0
check_for_orphaned_services=1
check_service_freshness=1
service_freshness_check_interval=60
check_host_freshness=1
host_freshness_check_interval=60
aggregate_status_updates=1
status_update_interval=15
enable_flap_detection=0
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=us
p1_file=/apps/nagios/2.5/bin/p1.pl
illegal_object_name_chars=~!$%^&*|'"<>?,()= illegal_macro_output_chars=
~$&|’"<>
use_regexp_matching=0
use_true_regexp_matching=0
[email protected]
[email protected]
daemon_dumps_core=0
Oh? I will have to assume then, by stale you are meaning that this is a distributed server setup. The central nagios server gets alerts via nsca, so all checks are being performed on remote systems. If that is correct, then you might be getting a stale message due to freshness checking. If that is true, then find out why the remote systems have stopped sending data via nsca.