Very strange - host/service schedule checks


#1

Hi all,

I upgraded to the last stable versions of nagios-core, nagios-plugins, rrdtool, nagios grapher, and the things go bad…
Now the versions are:

  • Nagios v3.2.1 Stable
  • Nagios Grapher 1.7.1
  • rrdtool 1.4.3
  • NagiosQL 3.0.3

The problem:
When I stop and start the nagios by the rc script, all 813 hosts are checked by the initial check, only the hosts, not the services…
After that the schedule checks neither for the hosts, neither for the services won’t work…the check_interval for example for the service, which is checking hosts with ping is 4 minutes…nothing happen after 4 or more minutes…only the initial check works…
For now I restarted nagios in crontab every 6 minutes to work something…but I need to figure it out what happens…

Here is some config files:

nagios.cfg:


log_file=/var/nagios/nagios.log

# You can specify individual object config files as shown below:
cfg_file=/etc/nagios/commands.cfg

cfg_file=/etc/nagios/contactgroups.cfg
cfg_file=/etc/nagios/contacts.cfg

cfg_file=/etc/nagios/hostescalations.cfg
cfg_file=/etc/nagios/timeperiods.cfg 
cfg_file=/etc/nagios/hostgroups.cfg
cfg_file=/etc/nagios/servicegroups.cfg

cfg_dir=/etc/nagios/hosts
cfg_dir=/etc/nagios/services
cfg_dir=/etc/nagios/serviceext

# Definitions for monitoring the local (Linux) host
#cfg_file=/etc/nagios/objects/localhost.cfg

# Definitions for monitoring a Windows machine
#cfg_file=/usr/local/nagios/etc/objects/windows.cfg

# Definitions for monitoring a router/switch
#cfg_file=/usr/local/nagios/etc/objects/switch.cfg

# Definitions for monitoring a network printer
#cfg_file=/usr/local/nagios/etc/objects/printer.cfg

#cfg_dir=/usr/local/nagios/etc/servers
#cfg_dir=/usr/local/nagios/etc/printers
#cfg_dir=/usr/local/nagios/etc/switches
#cfg_dir=/usr/local/nagios/etc/routers
cfg_file=/etc/nagios/hostextinfo.cfg

object_cache_file=/var/nagios/objects.cache

precached_object_file=/var/nagios/objects.precache

resource_file=/etc/nagios/resource.cfg

status_file=/var/nagios/status.dat

status_update_interval=10.

nagios_user=nagios

nagios_group=nagios

check_external_commands=1

#command_check_interval=15s
command_check_interval=-1

command_file=/var/nagios/rw/nagios.cmd

external_command_buffer_slots=4096

lock_file=/var/nagios/nagios.lock

temp_file=/var/nagios/nagios.tmp

temp_path=/tmp

event_broker_options=-1

#broker_module=/somewhere/module1.o
#broker_module=/somewhere/module2.o arg1 arg2=3 debug=0

log_rotation_method=d

log_archive_path=/var/nagios/archives

use_syslog=0

log_notifications=1

log_service_retries=1

log_host_retries=1

log_event_handlers=1

log_initial_states=0

log_external_commands=1

log_passive_checks=1

#global_host_event_handler=somecommand
#global_service_event_handler=somecommand

service_inter_check_delay_method=s

max_service_check_spread=30

service_interleave_factor=s

#test
host_inter_check_delay_method=n

#test
max_host_check_spread=1

max_concurrent_checks=0

check_result_reaper_frequency=10

max_check_result_reaper_time=30

check_result_path=/var/nagios/spool/checkresults

max_check_result_file_age=3600

cached_host_check_horizon=15

cached_service_check_horizon=15

enable_predictive_host_dependency_checks=1

enable_predictive_service_dependency_checks=1

soft_state_dependencies=0

#time_change_threshold=900

auto_reschedule_checks=0

auto_rescheduling_interval=30

auto_rescheduling_window=3600

sleep_time=0.25

service_check_timeout=100
host_check_timeout=20
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5



# RETAIN STATE INFORMATION
# This setting determines whether or not Nagios will save state
# information for services and hosts before it shuts down.  Upon
# startup Nagios will reload all saved service and host state
# information before starting to monitor.  This is useful for 
# maintaining long-term data on state statistics, etc, but will
# slow Nagios down a bit when it (re)starts.  Since its only
# a one-time penalty, I think its well worth the additional
# startup delay.

retain_state_information=1



# STATE RETENTION FILE
# This is the file that Nagios should use to store host and
# service state information before it shuts down.  The state 
# information in this file is also read immediately prior to
# starting to monitor the network when Nagios is restarted.
# This file is used only if the preserve_state_information
# variable is set to 1.

state_retention_file=/var/nagios/retention.dat



# RETENTION DATA UPDATE INTERVAL
# This setting determines how often (in minutes) that Nagios
# will automatically save retention data during normal operation.
# If you set this value to 0, Nagios will not save retention
# data at regular interval, but it will still save retention
# data before shutting down or restarting.  If you have disabled
# state retention, this option has no effect.

retention_update_interval=60



# USE RETAINED PROGRAM STATE
# This setting determines whether or not Nagios will set 
# program status variables based on the values saved in the
# retention file.  If you want to use retained program status
# information, set this value to 1.  If not, set this value
# to 0.

use_retained_program_state=1



# USE RETAINED SCHEDULING INFO
# This setting determines whether or not Nagios will retain
# the scheduling info (next check time) for hosts and services
# based on the values saved in the retention file.  If you
# If you want to use retained scheduling info, set this
# value to 1.  If not, set this value to 0.
use_retained_scheduling_info=0



# RETAINED ATTRIBUTE MASKS (ADVANCED FEATURE)
# The following variables are used to specify specific host and
# service attributes that should *not* be retained by Nagios during
# program restarts.
#
# The values of the masks are bitwise ANDs of values specified
# by the "MODATTR_" definitions found in include/common.h.  
# For example, if you do not want the current enabled/disabled state
# of flap detection and event handlers for hosts to be retained, you
# would use a value of 24 for the host attribute mask...
# MODATTR_EVENT_HANDLER_ENABLED (8) + MODATTR_FLAP_DETECTION_ENABLED (16) = 24

# This mask determines what host attributes are not retained
retained_host_attribute_mask=0

# This mask determines what service attributes are not retained
retained_service_attribute_mask=0

# These two masks determine what process attributes are not retained.
# There are two masks, because some process attributes have host and service
# options.  For example, you can disable active host checks, but leave active
# service checks enabled.
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0

# These two masks determine what contact attributes are not retained.
# There are two masks, because some contact attributes have host and
# service options.  For example, you can disable host notifications for
# a contact, but leave service notifications enabled for them.
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0


# INTERVAL LENGTH
# This is the seconds per unit interval as used in the
# host/contact/service configuration files.  Setting this to 60 means
# that each interval is one minute long (60 seconds).  Other settings
# have not been tested much, so your mileage is likely to vary...

interval_length=60



# AGGRESSIVE HOST CHECKING OPTION
# If you don't want to turn on aggressive host checking features, set
# this value to 0 (the default).  Otherwise set this value to 1 to
# enable the aggressive check option.  Read the docs for more info
# on what aggressive host check is or check out the source code in
# base/checks.c

use_aggressive_host_checking=0



# SERVICE CHECK EXECUTION OPTION
# This determines whether or not Nagios will actively execute
# service checks when it initially starts.  If this option is 
# disabled, checks are not actively made, but Nagios can still
# receive and process passive check results that come in.  Unless
# you're implementing redundant hosts or have a special need for
# disabling the execution of service checks, leave this enabled!
# Values: 1 = enable checks, 0 = disable checks

execute_service_checks=1


# PASSIVE SERVICE CHECK ACCEPTANCE OPTION
# This determines whether or not Nagios will accept passive
# service checks results when it initially (re)starts.
# Values: 1 = accept passive checks, 0 = reject passive checks

accept_passive_service_checks=1



# HOST CHECK EXECUTION OPTION
# This determines whether or not Nagios will actively execute
# host checks when it initially starts.  If this option is 
# disabled, checks are not actively made, but Nagios can still
# receive and process passive check results that come in.  Unless
# you're implementing redundant hosts or have a special need for
# disabling the execution of host checks, leave this enabled!
# Values: 1 = enable checks, 0 = disable checks

execute_host_checks=1



# PASSIVE HOST CHECK ACCEPTANCE OPTION
# This determines whether or not Nagios will accept passive
# host checks results when it initially (re)starts.
# Values: 1 = accept passive checks, 0 = reject passive checks

accept_passive_host_checks=1



# NOTIFICATIONS OPTION
# This determines whether or not Nagios will sent out any host or
# service notifications when it is initially (re)started.
# Values: 1 = enable notifications, 0 = disable notifications
enable_notifications=1



# EVENT HANDLER USE OPTION
# This determines whether or not Nagios will run any host or
# service event handlers when it is initially (re)started.  Unless
# you're implementing redundant hosts, leave this option enabled.
# Values: 1 = enable event handlers, 0 = disable event handlers

enable_event_handlers=1



# PROCESS PERFORMANCE DATA OPTION
# This determines whether or not Nagios will process performance
# data returned from service and host checks.  If this option is
# enabled, host performance data will be processed using the
# host_perfdata_command (defined below) and service performance
# data will be processed using the service_perfdata_command (also
# defined below).  Read the HTML docs for more information on
# performance data.
# Values: 1 = process performance data, 0 = do not process performance data

process_performance_data=1



# HOST AND SERVICE PERFORMANCE DATA PROCESSING COMMANDS
# These commands are run after every host and service check is
# performed.  These commands are executed only if the
# enable_performance_data option (above) is set to 1.  The command
# argument is the short name of a command definition that you 
# define in your host configuration file.  Read the HTML docs for
# more information on performance data.
#host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata



# HOST AND SERVICE PERFORMANCE DATA FILES
# These files are used to store host and service performance data.
# Performance data is only written to these files if the
# enable_performance_data option (above) is set to 1.

#host_perfdata_file=/tmp/host-perfdata
#service_perfdata_file=/tmp/service-perfdata



# HOST AND SERVICE PERFORMANCE DATA FILE TEMPLATES
# These options determine what data is written (and how) to the
# performance data files.  The templates may contain macros, special
# characters (t for tab, r for carriage return, n for newline)
# and plain text.  A newline is automatically added after each write
# to the performance data file.  Some examples of what you can do are
# shown below.

#host_perfdata_file_template=[HOSTPERFDATA]t$TIMET$t$HOSTNAME$t$HOSTEXECUTIONTIME$t$HOSTOUTPUT$t$HOSTPERFDATA$
#service_perfdata_file_template=[SERVICEPERFDATA]t$TIMET$t$HOSTNAME$t$SERVICEDESC$t$SERVICEEXECUTIONTIME$t$SERVICELATENCY$t$SERVICEOUTPUT$t
$SERVICEPERFDATA$



# HOST AND SERVICE PERFORMANCE DATA FILE MODES
# This option determines whether or not the host and service
# performance data files are opened in write ("w") or append ("a")
# mode. If you want to use named pipes, you should use the special
# pipe ("p") mode which avoid blocking at startup, otherwise you will
# likely want the defult append ("a") mode.
#host_perfdata_file_mode=a
#service_perfdata_file_mode=a



# HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING INTERVAL
# These options determine how often (in seconds) the host and service
# performance data files are processed using the commands defined
# below.  A value of 0 indicates the files should not be periodically
# processed.

#host_perfdata_file_processing_interval=0
#service_perfdata_file_processing_interval=0



# HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING COMMANDS
# These commands are used to periodically process the host and
# service performance data files.  The interval at which the
# processing occurs is determined by the options above.

#host_perfdata_file_processing_command=process-host-perfdata-file
#service_perfdata_file_processing_command=process-service-perfdata-file

#service_perfdata_file=/usr/local/nagios/var/service-perfdata
#service_perfdata_file_template=$HOSTNAME$\t$SERVICEDESC$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\t$TIMET$
#service_perfdata_file_mode=a
#service_perfdata_file_processing_interval=30
#service_perfdata_file_processing_command=process-service-perfdata-file

# OBSESS OVER SERVICE CHECKS OPTION
# This determines whether or not Nagios will obsess over service
# checks and run the ocsp_command defined below.  Unless you're
# planning on implementing distributed monitoring, do not enable
# this option.  Read the HTML docs for more information on
# implementing distributed monitoring.
# Values: 1 = obsess over services, 0 = do not obsess (default)

obsess_over_services=0



# OBSESSIVE COMPULSIVE SERVICE PROCESSOR COMMAND
# This is the command that is run for every service check that is
# processed by Nagios.  This command is executed only if the
# obsess_over_services option (above) is set to 1.  The command 
# argument is the short name of a command definition that you
# define in your host configuration file. Read the HTML docs for
# more information on implementing distributed monitoring.

#ocsp_command=somecommand



# OBSESS OVER HOST CHECKS OPTION
# This determines whether or not Nagios will obsess over host
# checks and run the ochp_command defined below.  Unless you're
# planning on implementing distributed monitoring, do not enable
# this option.  Read the HTML docs for more information on
# implementing distributed monitoring.
# Values: 1 = obsess over hosts, 0 = do not obsess (default)

obsess_over_hosts=0



# OBSESSIVE COMPULSIVE HOST PROCESSOR COMMAND
# This is the command that is run for every host check that is
# processed by Nagios.  This command is executed only if the
# obsess_over_hosts option (above) is set to 1.  The command 
# argument is the short name of a command definition that you
# define in your host configuration file. Read the HTML docs for
# more information on implementing distributed monitoring.

#ochp_command=somecommand



# TRANSLATE PASSIVE HOST CHECKS OPTION
# This determines whether or not Nagios will translate
# DOWN/UNREACHABLE passive host check results into their proper
# state for this instance of Nagios.  This option is useful
# if you have distributed or failover monitoring setup.  In
# these cases your other Nagios servers probably have a different
# "view" of the network, with regards to the parent/child relationship
# of hosts.  If a distributed monitoring server thinks a host
# is DOWN, it may actually be UNREACHABLE from the point of
# this Nagios instance.  Enabling this option will tell Nagios
# to translate any DOWN or UNREACHABLE host states it receives
# passively into the correct state from the view of this server.
# Values: 1 = perform translation, 0 = do not translate (default)

translate_passive_host_checks=0



# PASSIVE HOST CHECKS ARE SOFT OPTION
# This determines whether or not Nagios will treat passive host
# checks as being HARD or SOFT.  By default, a passive host check
# result will put a host into a HARD state type.  This can be changed
# by enabling this option.
# Values: 0 = passive checks are HARD, 1 = passive checks are SOFT

passive_host_checks_are_soft=0



# ORPHANED HOST/SERVICE CHECK OPTIONS
# These options determine whether or not Nagios will periodically 
# check for orphaned host service checks.  Since service checks are
# not rescheduled until the results of their previous execution 
# instance are processed, there exists a possibility that some
# checks may never get rescheduled.  A similar situation exists for
# host checks, although the exact scheduling details differ a bit
# from service checks.  Orphaned checks seem to be a rare
# problem and should not happen under normal circumstances.
# If you have problems with service checks never getting
# rescheduled, make sure you have orphaned service checks enabled.
# Values: 1 = enable checks, 0 = disable checks

check_for_orphaned_services=1
check_for_orphaned_hosts=0



# SERVICE FRESHNESS CHECK OPTION
# This option determines whether or not Nagios will periodically
# check the "freshness" of service results.  Enabling this option
# is useful for ensuring passive checks are received in a timely
# manner.
# Values: 1 = enabled freshness checking, 0 = disable freshness checking

check_service_freshness=0



# SERVICE FRESHNESS CHECK INTERVAL
# This setting determines how often (in seconds) Nagios will
# check the "freshness" of service check results.  If you have
# disabled service freshness checking, this option has no effect.

service_freshness_check_interval=30
#service_freshness_check_interval=60


# HOST FRESHNESS CHECK OPTION
# This option determines whether or not Nagios will periodically
# check the "freshness" of host results.  Enabling this option
# is useful for ensuring passive checks are received in a timely
# manner.
# Values: 1 = enabled freshness checking, 0 = disable freshness checking

#test
check_host_freshness=1



# HOST FRESHNESS CHECK INTERVAL
# This setting determines how often (in seconds) Nagios will
# check the "freshness" of host check results.  If you have
# disabled host freshness checking, this option has no effect.

host_freshness_check_interval=60




# ADDITIONAL FRESHNESS THRESHOLD LATENCY
# This setting determines the number of seconds that Nagios
# will add to any host and service freshness thresholds that
# it calculates (those not explicitly specified by the user).

additional_freshness_latency=15




# FLAP DETECTION OPTION
# This option determines whether or not Nagios will try
# and detect hosts and services that are "flapping".  
# Flapping occurs when a host or service changes between
# states too frequently.  When Nagios detects that a 
# host or service is flapping, it will temporarily suppress
# notifications for that host/service until it stops
# flapping.  Flap detection is very experimental, so read
# the HTML documentation before enabling this feature!
# Values: 1 = enable flap detection
#         0 = disable flap detection (default)

enable_flap_detection=1



# FLAP DETECTION THRESHOLDS FOR HOSTS AND SERVICES
# Read the HTML documentation on flap detection for
# an explanation of what this option does.  This option
# has no effect if flap detection is disabled.

low_service_flap_threshold=5.0
high_service_flap_threshold=37.0
low_host_flap_threshold=5.0
high_host_flap_threshold=37.0



# DATE FORMAT OPTION
# This option determines how short dates are displayed. Valid options
# include:
#       us              (MM-DD-YYYY HH:MM:SS)
#       euro            (DD-MM-YYYY HH:MM:SS)
#       iso8601         (YYYY-MM-DD HH:MM:SS)
#       strict-iso8601  (YYYY-MM-DDTHH:MM:SS)
#
date_format=iso8601




# TIMEZONE OFFSET
# This option is used to override the default timezone that this
# instance of Nagios runs in.  If not specified, Nagios will use
# the system configured timezone.
#
# NOTE: In order to display the correct timezone in the CGIs, you
# will also need to alter the Apache directives for the CGI path 
# to include your timezone.  Example:
#
#   <Directory "/usr/local/nagios/sbin/">
#      SetEnv TZ "Australia/Brisbane"
#      ...
#   </Directory>

#use_timezone=US/Mountain
#use_timezone=Australia/Brisbane




# P1.PL FILE LOCATION
# This value determines where the p1.pl perl script (used by the
# embedded Perl interpreter) is located.  If you didn't compile
# Nagios with embedded Perl support, this option has no effect.

p1_file=/usr/local/nagios/bin/p1.pl



# EMBEDDED PERL INTERPRETER OPTION
# This option determines whether or not the embedded Perl interpreter
# will be enabled during runtime.  This option has no effect if Nagios
# has not been compiled with support for embedded Perl.
# Values: 0 = disable interpreter, 1 = enable interpreter

enable_embedded_perl=1



# EMBEDDED PERL USAGE OPTION




# This option determines whether or not Nagios will process Perl plugins
# and scripts with the embedded Perl interpreter if the plugins/scripts
# do not explicitly indicate whether or not it is okay to do so. Read
# the HTML documentation on the embedded Perl interpreter for more 
# information on how this option works.

use_embedded_perl_implicitly=1



# ILLEGAL OBJECT NAME CHARACTERS
# This option allows you to specify illegal characters that cannot
# be used in host names, service descriptions, or names of other
# object types.

illegal_object_name_chars=`~!$%^&*|'"<>?,()=



# ILLEGAL MACRO OUTPUT CHARACTERS
# This option allows you to specify illegal characters that are
# stripped from macros before being used in notifications, event
# handlers, etc.  This DOES NOT affect macros used in service or
# host check commands.
# The following macros are stripped of the characters you specify:
#       $HOSTOUTPUT$
#       $HOSTPERFDATA$
#       $HOSTACKAUTHOR$
#       $HOSTACKCOMMENT$
#       $SERVICEOUTPUT$
#       $SERVICEPERFDATA$
#       $SERVICEACKAUTHOR$
#       $SERVICEACKCOMMENT$

illegal_macro_output_chars=`~$&|'"<>



# REGULAR EXPRESSION MATCHING
# This option controls whether or not regular expression matching
# takes place in the object config files.  Regular expression
# matching is used to match host, hostgroup, service, and service
# group names/descriptions in some fields of various object types.
# Values: 1 = enable regexp matching, 0 = disable regexp matching

use_regexp_matching=0



# "TRUE" REGULAR EXPRESSION MATCHING
# This option controls whether or not "true" regular expression 
# matching takes place in the object config files.  This option
# only has an effect if regular expression matching is enabled
# (see above).  If this option is DISABLED, regular expression
# matching only occurs if a string contains wildcard characters
# (* and ?).  If the option is ENABLED, regexp matching occurs
# all the time (which can be annoying).
# Values: 1 = enable true matching, 0 = disable true matching
use_true_regexp_matching=0



# ADMINISTRATOR EMAIL/PAGER ADDRESSES
# The email and pager address of a global administrator (likely you).
# Nagios never uses these values itself, but you can access them by
# using the $ADMINEMAIL$ and $ADMINPAGER$ macros in your notification
# commands.

admin_email=nagios@localhost
admin_pager=pagenagios@localhost



# DAEMON CORE DUMP OPTION
# This option determines whether or not Nagios is allowed to create
# a core dump when it runs as a daemon.  Note that it is generally
# considered bad form to allow this, but it may be useful for
# debugging purposes.  Enabling this option doesn't guarantee that
# a core file will be produced, but that's just life...
# Values: 1 - Allow core dumps
#         0 - Do not allow core dumps (default)

daemon_dumps_core=0



# LARGE INSTALLATION TWEAKS OPTION
# This option determines whether or not Nagios will take some shortcuts
# which can save on memory and CPU usage in large Nagios installations.
# Read the documentation for more information on the benefits/tradeoffs
# of enabling this option.
# Values: 1 - Enabled tweaks
#         0 - Disable tweaks (default)

use_large_installation_tweaks=0



# ENABLE ENVIRONMENT MACROS
# This option determines whether or not Nagios will make all standard
# macros available as environment variables when host/service checks
# and system commands (event handlers, notifications, etc.) are
# executed.  Enabling this option can cause performance issues in 
# large installations, as it will consume a bit more memory and (more
# importantly) consume more CPU.
# Values: 1 - Enable environment variable macros (default)
#         0 - Disable environment variable macros

enable_environment_macros=1



# CHILD PROCESS MEMORY OPTION
# This option determines whether or not Nagios will free memory in
# child processes (processed used to execute system commands and host/
# service checks).  If you specify a value here, it will override
# program defaults.
# Value: 1 - Free memory in child processes
#        0 - Do not free memory in child processes

#free_child_process_memory=1



# CHILD PROCESS FORKING BEHAVIOR
# This option determines how Nagios will fork child processes
# (used to execute system commands and host/service checks).  Normally
# child processes are fork()ed twice, which provides a very high level
# of isolation from problems.  Fork()ing once is probably enough and will
# save a great deal on CPU usage (in large installs), so you might
# want to consider using this.  If you specify a value here, it will
# program defaults.
# Value: 1 - Child processes fork() twice
#        0 - Child processes fork() just once

#child_processes_fork_twice=1



# DEBUG LEVEL
# This option determines how much (if any) debugging information will
# be written to the debug file.  OR values together to log multiple
# types of information.
# Values: 
#          -1 = Everything
#          0 = Nothing
#          1 = Functions
#          2 = Configuration
#          4 = Process information
#          8 = Scheduled events
#          16 = Host/service checks
#          32 = Notifications
#          64 = Event broker
#          128 = External commands
#          256 = Commands
#          512 = Scheduled downtime
#          1024 = Comments
#          2048 = Macros

#debug_level=2
debug_level=8



# DEBUG VERBOSITY
# This option determines how verbose the debug log out will be.
#          256 = Commands
#          512 = Scheduled downtime
#          1024 = Comments
#          2048 = Macros

#debug_level=2
debug_level=8



# DEBUG VERBOSITY
# This option determines how verbose the debug log out will be.
# Values: 0 = Brief output
#         1 = More detailed
#         2 = Very detailed

debug_verbosity=1


# DEBUG FILE
# This option determines where Nagios should write debugging information.

debug_file=/var/nagios/nagios.debug



# MAX DEBUG FILE SIZE
# This option determines the maximum size (in bytes) of the debug file.  If
# the file grows larger than this size, it will be renamed with a .old
# extension.  If a file already exists with a .old extension it will
# automatically be deleted.  This helps ensure your disk space usage doesn't
# get out of control when debugging Nagios.

max_debug_file_size=1000000

services/clients.cfg:

define service { hostgroup_name hostgroup1, hostgroup2 service_description PING check_command check-ping-1!50:1%!100:3%!30!45 is_volatile 1 max_check_attempts 3 check_interval 4 retry_interval 1 passive_checks_enabled 1 check_period 24x7 check_freshness 1 event_handler_enabled 1 flap_detection_enabled 1 notification_interval 60 notification_period 24x7 notification_options u,r,c contact_groups Support register 1 }

nagiostat:

[code]Nagios Stats 3.2.1
Copyright © 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 03-09-2010
License: GPL

CURRENT STATUS DATA

Status File: /var/nagios/status.dat
Status File Age: 0d 0h 0m 38s
Status File Version: 3.2.1

Program Running Time: 0d 0h 0m 39s
Nagios PID: 32678
Used/High/Total Command Buffers: 0 / 0 / 4096

Total Services: 1847
Services Checked: 1846
Services Scheduled: 1842
Services Actively Checked: 1846
Services Passively Checked: 1
Total Service State Change: 0.000 / 37.890 / 0.303 %
Active Service Latency: 0.000 / 358.657 / 35.125 sec
Active Service Execution Time: 0.000 / 100.005 / 11.427 sec
Active Service State Change: 0.000 / 37.890 / 0.303 %
Active Services Last 1/5/15/60 min: 0 / 37 / 43 / 157
Passive Service Latency: 0.256 / 0.256 / 0.256 sec
Passive Service State Change: 0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit: 1664 / 30 / 17 / 136
Services Flapping: 1
Services In Downtime: 0

Total Hosts: 813
Hosts Checked: 813
Hosts Scheduled: 808
Hosts Actively Checked: 812
Host Passively Checked: 1
Total Host State Change: 0.000 / 38.550 / 0.136 %
Active Host Latency: 0.000 / 18.635 / 8.857 sec
Active Host Execution Time: 0.000 / 20.005 / 3.682 sec
Active Host State Change: 0.000 / 38.550 / 0.136 %
Active Hosts Last 1/5/15/60 min: 0 / 0 / 810 / 810
Passive Host Latency: 0.256 / 0.256 / 0.256 sec
Passive Host State Change: 0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
Hosts Up/Down/Unreach: 744 / 66 / 3
Hosts Flapping: 1
Hosts In Downtime: 0

Active Host Checks Last 1/5/15 min: 0 / 0 / 0
Scheduled: 0 / 0 / 0
On-demand: 0 / 0 / 0
Parallel: 0 / 0 / 0
Serial: 0 / 0 / 0
Cached: 0 / 0 / 0
Passive Host Checks Last 1/5/15 min: 0 / 0 / 0
Active Service Checks Last 1/5/15 min: 0 / 0 / 0
Scheduled: 0 / 0 / 0
On-demand: 0 / 0 / 0
Cached: 0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 0 / 0

External Commands Last 1/5/15 min: 0 / 0 / 0
[/code]

Any ideas?


#2

In nagios.log there is a lot of these lines for various hosts:

[1273914522] Warning: Service performance data command ‘echo -e ‘BGP-Client1\tBGP\tBGP OK - 6\tiso.3.6.1.2.1.15.3.1.2.312.22.23.24=6’ > /var/nagios/ngraph.pipe’ for service ‘BGP’ on host ‘BGP-Client1’ timed out after 5 seconds

nagios.debug:

[1273914604.021989] [008.1] [pid=32678] Next Low Priority Event Time: Sat May 15 12:09:38 2010 [1273914604.021998] [008.1] [pid=32678] Current/Max Service Checks: 453/0 [1273914604.022015] [008.0] [pid=32678] ** Timed Event ** Type: 5, Run Time: Sat May 15 12:10:04 2010 [1273914604.022025] [008.0] [pid=32678] ** Check Result Reaper [1273914640.140366] [008.1] [pid=32678] ** Event Check Loop [1273914640.140403] [008.1] [pid=32678] Next High Priority Event Time: Sat May 15 12:10:04 2010 [1273914640.140416] [008.1] [pid=32678] Next Low Priority Event Time: Sat May 15 12:09:38 2010 [1273914640.140424] [008.1] [pid=32678] Current/Max Service Checks: 447/0 [1273914640.140438] [008.0] [pid=32678] ** Timed Event ** Type: 8, Run Time: Sat May 15 12:10:04 2010 [1273914640.140447] [008.0] [pid=32678] ** Status Data Save Event [1273914640.227312] [008.1] [pid=32678] ** Event Check Loop [1273914640.227338] [008.1] [pid=32678] Next High Priority Event Time: Sat May 15 12:10:12 2010 [1273914640.227350] [008.1] [pid=32678] Next Low Priority Event Time: Sat May 15 12:09:38 2010 [1273914640.227359] [008.1] [pid=32678] Current/Max Service Checks: 447/0 [1273914640.227370] [008.0] [pid=32678] ** Timed Event ** Type: 6, Run Time: Sat May 15 12:10:12 2010 [1273914640.227379] [008.0] [pid=32678] ** Orphaned Host and Service Check Event [1273914640.227486] [008.1] [pid=32678] ** Event Check Loop [1273914640.227496] [008.1] [pid=32678] Next High Priority Event Time: Sat May 15 12:10:12 2010 [1273914640.227507] [008.1] [pid=32678] Next Low Priority Event Time: Sat May 15 12:09:38 2010 [1273914640.227515] [008.1] [pid=32678] Current/Max Service Checks: 447/0 [1273914640.227526] [008.0] [pid=32678] ** Timed Event ** Type: 13, Run Time: Sat May 15 12:10:12 2010 [1273914640.227534] [008.0] [pid=32678] ** Host Result Freshness Check Event [1273914640.230346] [008.1] [pid=32678] ** Event Check Loop [1273914640.230359] [008.1] [pid=32678] Next High Priority Event Time: Sat May 15 12:10:12 2010 [1273914640.230370] [008.1] [pid=32678] Next Low Priority Event Time: Sat May 15 12:09:38 2010 [1273914640.230378] [008.1] [pid=32678] Current/Max Service Checks: 447/0 [1273914640.230392] [008.0] [pid=32678] ** Timed Event ** Type: 1, Run Time: Sat May 15 12:10:12 2010 [1273914640.230401] [008.0] [pid=32678] ** External Command Check Event [1273914640.230414] [008.1] [pid=32678] ** Event Check Loop [1273914640.230423] [008.1] [pid=32678] Next High Priority Event Time: Sat May 15 12:10:40 2010 [1273914640.230433] [008.1] [pid=32678] Next Low Priority Event Time: Sat May 15 12:09:38 2010 [1273914640.230441] [008.1] [pid=32678] Current/Max Service Checks: 447/0 [1273914640.230452] [008.0] [pid=32678] ** Timed Event ** Type: 5, Run Time: Sat May 15 12:10:40 2010 [1273914640.230460] [008.0] [pid=32678] ** Check Result Reaper

Another log from nagios.debug, more interesting:

1273914748.459106] [008.1] [pid=5547] Next Low Priority Event Time: Sat May 15 12:12:11 2010 [1273914748.459114] [008.1] [pid=5547] Current/Max Service Checks: 0/0 [1273914748.459125] [008.1] [pid=5547] Running event... [1273914748.459166] [008.0] [pid=5547] ** Timed Event ** Type: 12, Run Time: Sat May 15 12:12:11 2010 [1273914748.459177] [008.0] [pid=5547] ** Host Check Event ==> Host: 'Wireless-Client1', Options: 0, Latency: 17.459000 sec [1273914748.477456] [008.1] [pid=5547] ** Event Check Loop [1273914748.477531] [008.1] [pid=5547] Next High Priority Event Time: Sat May 15 12:12:31 2010 [1273914748.477544] [008.1] [pid=5547] Next Low Priority Event Time: Sat May 15 12:12:11 2010 [1273914748.477552] [008.1] [pid=5547] Current/Max Service Checks: 0/0 [1273914748.477563] [008.1] [pid=5547] Running event... [1273914748.477574] [008.0] [pid=5547] ** Timed Event ** Type: 12, Run Time: Sat May 15 12:12:11 2010 [1273914748.477584] [008.0] [pid=5547] ** Host Check Event ==> Host: 'Wireless-Client2', Options: 0, Latency: 17.477000 sec [1273914748.496060] [008.1] [pid=5547] ** Event Check Loop [1273914748.496132] [008.1] [pid=5547] Next High Priority Event Time: Sat May 15 12:12:31 2010 [1273914748.496145] [008.1] [pid=5547] Next Low Priority Event Time: Sat May 15 12:12:11 2010 [1273914748.496153] [008.1] [pid=5547] Current/Max Service Checks: 0/0 [1273914748.496163] [008.1] [pid=5547] Running event... [1273914748.496175] [008.0] [pid=5547] ** Timed Event ** Type: 12, Run Time: Sat May 15 12:12:11 2010 [1273914748.496186] [008.0] [pid=5547] ** Host Check Event ==> Host: 'Wireless-Client3', Options: 0, Latency: 17.496000 sec [1273914748.514734] [008.1] [pid=5547] ** Event Check Loop [1273914748.514808] [008.1] [pid=5547] Next High Priority Event Time: Sat May 15 12:12:31 2010 [1273914748.514821] [008.1] [pid=5547] Next Low Priority Event Time: Sat May 15 12:12:11 2010 [1273914748.514829] [008.1] [pid=5547] Current/Max Service Checks: 0/0 [1273914748.514839] [008.1] [pid=5547] Running event... [1273914748.514851] [008.0] [pid=5547] ** Timed Event ** Type: 12, Run Time: Sat May 15 12:12:11 2010 [1273914748.514861] [008.0] [pid=5547] ** Host Check Event ==> Host: 'Wireless-Client4', Options: 0, Latency: 17.514000 sec [1273914748.533188] [008.1] [pid=5547] ** Event Check Loop [1273914748.533261] [008.1] [pid=5547] Next High Priority Event Time: Sat May 15 12:12:31 2010 [1273914748.533274] [008.1] [pid=5547] Next Low Priority Event Time: Sat May 15 12:12:11 2010 [1273914748.533282] [008.1] [pid=5547] Current/Max Service Checks: 0/0 [1273914748.533292] [008.1] [pid=5547] Running event... [1273914748.533304] [008.0] [pid=5547] ** Timed Event ** Type: 12, Run Time: Sat May 15 12:12:11 2010 [1273914748.533314] [008.0] [pid=5547] ** Host Check Event ==> Host: 'Wireless-Client5', Options: 0, Latency: 17.533000 sec [1273914748.551718] [008.1] [pid=5547] ** Event Check Loop [1273914748.551790] [008.1] [pid=5547] Next High Priority Event Time: Sat May 15 12:12:31 2010]


#3

After stop/start nagios:

[1273924279.842014] [008.0] [pid=3888] ** Service Check Event ==> Host: ‘Switch-Client1’, Service: ‘PING’, Options: 0, Latency: 7.842000 sec

In the web interface:
Last Check:
2010-05-13 21:13:09

Schedule checks still don’t work…