Nagios sizing question


#1

Hi,

I have a nagios installation which caters to 359 hosts, receiving 5225 passive checks from nsca every 5 minutes. I have read the documentation on sizing, but still, I have a weak understanding of the right numbers to put on my configs based on this volume and frequency (359 hosts, 5225 passive checks every 5 mins). Once in a while I recieve a stale check result. Based on my hardware server specs, I’m assuming I can handle more servers and incoming service checks without stale results.

Here are my relevant configuration entries:

check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_host_freshness=1
check_result_path=/var/log/nagios/spool/checkresults
check_result_reaper_frequency=30
check_service_freshness=1
child_processes_fork_twice=0
command_check_interval=-1
command_file=/var/log/nagios/rw/nagios.cmd
daemon_dumps_core=0
date_format=us
debug_file=/var/log/nagios/nagios.debug
debug_level=0
debug_verbosity=1
enable_embedded_perl=1
enable_environment_macros=0
enable_event_handlers=1
enable_flap_detection=1
enable_notifications=1
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_broker_options=-1
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=1
external_command_buffer_slots=8192
free_child_process_memory=0
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
host_check_timeout=30
host_freshness_check_interval=60
host_inter_check_delay_method=s
illegal_macro_output_chars=`~$&|'"<>
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
interval_length=60
lock_file=/var/run/nagios.pid
log_archive_path=/var/log/nagios/archives
log_event_handlers=1
log_external_commands=1
log_file=/var/log/nagios/nagios.log
log_host_retries=1
log_initial_states=0
log_notifications=1
log_passive_checks=1
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
max_check_result_file_age=3600
max_check_result_reaper_time=60
max_concurrent_checks=0
max_debug_file_size=1000000
max_host_check_spread=30
max_service_check_spread=30
nagios_group=nagios
nagios_user=nagios
notification_timeout=30
object_cache_file=/var/log/nagios/objects.cache
obsess_over_hosts=0
obsess_over_services=0
ocsp_timeout=5
p1_file=/usr/local/nagios/sbin/p1.pl
passive_host_checks_are_soft=0
perfdata_timeout=5
precached_object_file=/var/log/nagios/objects.precache
process_performance_data=0
resource_file=/etc/nagios/resource.cfg
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
retained_host_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_service_attribute_mask=0
retain_state_information=1
retention_update_interval=60
service_check_timeout=60
service_freshness_check_interval=360
service_inter_check_delay_method=s
service_interleave_factor=s
sleep_time=0.125
soft_state_dependencies=0
state_retention_file=/var/log/nagios/retention.dat
status_file=/var/log/nagios/status.dat
status_update_interval=20
temp_file=/var/log/nagios/nagios.tmp
temp_path=/tmp
translate_passive_host_checks=0
use_aggressive_host_checking=0
use_embedded_perl_implicitly=1
use_large_installation_tweaks=1
use_regexp_matching=1
use_retained_program_state=0
use_retained_scheduling_info=1
use_syslog=0
use_true_regexp_matching=0

what parameters do you see that needs some tuning, based on my numbers? Please advise.

Cheers,
Ace


#2

Maybe host and service freshness interval, because those parameters in nagios.cfg are in seocnds. If you set that freshness_threshold in every passive service definition, then that variable overrides the program-wide seting set in nagios.cfg

Check that out, and see if it helps.