50% drop in performance

Hi there,

I’m experiencing a strange error with regards to performance on Nagios which I can’t seem to figure out. On Saturday I was planning to install PNP4 (to graph results in Nagios). Before I started the pre-reqs I took a full backup and began.

I extended the disk space available to Nagios in anticipation of the PNP4 install and also gave extra memory to Nagios and then upgraded PHP.

When I brought Nagios back there was a major hit in performance which I can’t figure out. By around 50%! Please take a look at the attached graphs for a better ide of whats going on with the system. You can see it was working fine when the system went down, when it comes back its performance is crippled. Half amount of host checks in the same time - Check Latency has gone through the roof.




Any ideas?

Regards,
Si

Hi,
can you please post the nagios.cfg and performance data.
Thanks!

Of course sneakymonkey. Although I don’t think the problem lies here as nothing was changed here before the shutdown of Nagios, Disk space allocation increased and memory increased.

Could an increase in memory potentially cause some unexpected drop in performance?

See below:

*OBJECT CONFIG PROCESSING TIMES ( = Potential for precache savings with -u option)

Read: 0.029723 sec
Resolve: 0.002292 sec *
Recomb Contactgroups: 0.000132 sec *
Recomb Hostgroups: 0.001787 sec *
Dup Services: 0.023079 sec *
Recomb Servicegroups: 0.017388 sec *
Duplicate: 0.029023 sec *
Inherit: 0.002353 sec *
Recomb Contacts: 0.000000 sec *
Sort: 0.000001 sec *
Register: 0.054468 sec
Free: 0.001447 sec
============
TOTAL: 0.161694 sec * = 0.076056 sec (47.04%) estimated savings

RETENTION DATA TIMES

Read and Process: 0.225259 sec
============
TOTAL: 0.225259 sec

Timing information on configuration verification is listed below.

CONFIG VERIFICATION TIMES (* = Potential for speedup with -x option)

Object Relationships: 0.031028 sec
Circular Paths: 0.511448 sec *
Misc: 0.002778 sec
============
TOTAL: 0.545254 sec * = 0.511448 sec (93.8%) estimated savings

EVENT SCHEDULING TIMES

Get service info: 0.009585 sec
Get host info info: 0.001198 sec
Get service params: 0.000059 sec
Schedule service times: 0.020233 sec
Schedule service events: 0.008208 sec
Get host params: 0.000001 sec
Schedule host times: 0.002888 sec
Schedule host events: 0.002470 sec
============
TOTAL: 0.044642 sec

Projected scheduling information for host and service checks
is listed below. This information assumes that you are going
to start running Nagios with your current config files.

HOST SCHEDULING INFORMATION

Total hosts: 298
Total scheduled hosts: 297
Host inter-check delay method: SMART
Average host check interval: 80.05 sec
Host inter-check delay: 0.27 sec
Max host check spread: 3 min
First scheduled check: Wed Feb 17 14:40:30 2010
Last scheduled check: Wed Feb 17 14:41:49 2010

SERVICE SCHEDULING INFORMATION

Total services: 2137
Total scheduled services: 2124
Service inter-check delay method: SMART
Average service check interval: 61.48 sec
Inter-check delay: 0.03 sec
Interleave factor method: SMART
Average services per host: 7.17
Service interleave factor: 8
Max service check spread: 4 min
First scheduled check: Wed Feb 17 14:40:37 2010
Last scheduled check: Wed Feb 17 14:41:38 2010

CHECK PROCESSING INFORMATION

Check result reaper interval: 2 sec
Max concurrent service checks: Unlimited

PERFORMANCE SUGGESTIONS

I have no suggestions - things look okay.**

CFG:
**
##############################################################################

NAGIOS.CFG - Sample Main Config File for Nagios 3.0.6

Read the documentation for more information on this configuration

file. I’ve provided some comments here, but things may not be so

clear without further explanation.

Last Modified: 10-15-2008

##############################################################################

LOG FILE

This is the main log file where service and host events are logged

for historical purposes. This should be the first option specified

in the config file!!!

log_file=/usr/local/nagios/var/nagios.log

OBJECT CONFIGURATION FILE(S)

These are the object configuration files in which you define hosts,

host groups, contacts, contact groups, services, etc.

You can split your object definitions across several config files

if you wish (as shown below), or keep them all in a single config file.

You can specify individual object config files as shown below:

cfg_file=/usr/local/nagios/etc//commands.cfg
cfg_file=/usr/local/nagios/etc/
/contacts.cfg
cfg_file=/usr/local/nagios/etc//timeperiods.cfg
cfg_file=/usr/local/nagios/etc/
/templates.cfg
cfg_file=/usr/local/nagios/etc//groups.cfg
cfg_file=/usr/local/nagios/etc/
/globalconfig.cfg

You can also tell Nagios to process all config files (with a .cfg

extension) in a particular directory by using the cfg_dir

directive as shown below:

cfg_dir=/usr/local/nagios/etc//MTP/
cfg_dir=/usr/local/nagios/etc/
/LHS/
cfg_dir=/usr/local/nagios/etc//MAY/
cfg_dir=/usr/local/nagios/etc/
/BPW/
cfg_dir=/usr/local/nagios/etc//DSH/
cfg_dir=/usr/local/nagios/etc/
/NGM/
cfg_dir=/usr/local/nagios/etc//SSC/
cfg_dir=/usr/local/nagios/etc/
/PLT/

OBJECT CACHE FILE

This option determines where object definitions are cached when

Nagios starts/restarts. The CGIs read object definitions from

this cache file (rather than looking at the object config files

directly) in order to prevent inconsistencies that can occur

when the config files are modified after Nagios starts.

object_cache_file=/usr/local/nagios/var/objects.cache

PRE-CACHED OBJECT FILE

This options determines the location of the precached object file.

If you run Nagios with the -p command line option, it will preprocess

your object configuration file(s) and write the cached config to this

file. You can then start Nagios with the -u option to have it read

object definitions from this precached file, rather than the standard

object configuration files (see the cfg_file and cfg_dir options above).

Using a precached object file can speed up the time needed to (re)start

the Nagios process if you’ve got a large and/or complex configuration.

Read the documentation section on optimizing Nagios to find our more

about how this feature works.

precached_object_file=/usr/local/nagios/var/objects.precache

RESOURCE FILE

This is an optional resource file that contains $USERx$ macro

definitions. Multiple resource files can be specified by using

multiple resource_file definitions. The CGIs will not attempt to

read the contents of resource files, so information that is

considered to be sensitive (usernames, passwords, etc) can be

defined as macros in this file and restrictive permissions (600)

can be placed on this file.

resource_file=/usr/local/nagios/etc/resource.cfg

STATUS FILE

This is where the current status of all monitored services and

hosts is stored. Its contents are read and processed by the CGIs.

The contents of the status file are deleted every time Nagios

restarts.

status_file=/usr/local/nagios/var/status.dat

STATUS FILE UPDATE INTERVAL

This option determines the frequency (in seconds) that

Nagios will periodically dump program, host, and

service status data.

status_update_interval=10

NAGIOS USER

This determines the effective user that Nagios should run as.

You can either supply a username or a UID.

nagios_user=nagios

NAGIOS GROUP

This determines the effective group that Nagios should run as.

You can either supply a group name or a GID.

nagios_group=nagios

EXTERNAL COMMAND OPTION

This option allows you to specify whether or not Nagios should check

for external commands (in the command file defined below). By default

Nagios will not check for external commands, just to be on the

cautious side. If you want to be able to use the CGI command interface

you will have to enable this.

Values: 0 = disable commands, 1 = enable commands

check_external_commands=1

EXTERNAL COMMAND CHECK INTERVAL

This is the interval at which Nagios should check for external commands.

This value works of the interval_length you specify later. If you leave

that at its default value of 60 (seconds), a value of 1 here will cause

Nagios to check for external commands every minute. If you specify a

number followed by an “s” (i.e. 15s), this will be interpreted to mean

actual seconds rather than a multiple of the interval_length variable.

Note: In addition to reading the external command file at regularly

scheduled intervals, Nagios will also check for external commands after

event handlers are executed.

NOTE: Setting this value to -1 causes Nagios to check the external

command file as often as possible.

#command_check_interval=15s
command_check_interval=-1

EXTERNAL COMMAND FILE

This is the file that Nagios checks for external command requests.

It is also where the command CGI will write commands that are submitted

by users, so it must be writeable by the user that the web server

is running as (usually ‘nobody’). Permissions should be set at the

directory level instead of on the file, as the file is deleted every

time its contents are processed.

command_file=/usr/local/nagios/var/rw/nagios.cmd

EXTERNAL COMMAND BUFFER SLOTS

This settings is used to tweak the number of items or “slots” that

the Nagios daemon should allocate to the buffer that holds incoming

external commands before they are processed. As external commands

are processed by the daemon, they are removed from the buffer.

external_command_buffer_slots=4096

LOCK FILE

This is the lockfile that Nagios will use to store its PID number

in when it is running in daemon mode.

lock_file=/usr/local/nagios/var/nagios.lock

TEMP FILE

This is a temporary file that is used as scratch space when Nagios

updates the status log, cleans the comment file, etc. This file

is created, used, and deleted throughout the time that Nagios is

running.

temp_file=/usr/local/nagios/var/nagios.tmp

TEMP PATH

This is path where Nagios can create temp files for service and

host check results, etc.

temp_path=/tmp

EVENT BROKER OPTIONS

Controls what (if any) data gets sent to the event broker.

Values: 0 = Broker nothing

-1 = Broker everything

= See documentation

event_broker_options=-1

EVENT BROKER MODULE(S)

This directive is used to specify an event broker module that should

by loaded by Nagios at startup. Use multiple directives if you want

to load more than one module. Arguments that should be passed to

the module at startup are seperated from the module path by a space.

#!!!

WARNING !!! WARNING !!! WARNING !!! WARNING !!! WARNING !!! WARNING

#!!!

Do NOT overwrite modules while they are being used by Nagios or Nagios

will crash in a fiery display of SEGFAULT glory. This is a bug/limitation

either in dlopen(), the kernel, and/or the filesystem. And maybe Nagios…

The correct/safe way of updating a module is by using one of these methods:

1. Shutdown Nagios, replace the module file, restart Nagios

2. Delete the original module file, move the new module file into place, restart Nagios

Example:

broker_module= [moduleargs]

#broker_module=/somewhere/module1.o
#broker_module=/somewhere/module2.o arg1 arg2=3 debug=0

LOG ROTATION METHOD

This is the log rotation method that Nagios should use to rotate

the main log file. Values are as follows…

n = None - don’t rotate the log

h = Hourly rotation (top of the hour)

d = Daily rotation (midnight every day)

w = Weekly rotation (midnight on Saturday evening)

m = Monthly rotation (midnight last day of month)

log_rotation_method=d

LOG ARCHIVE PATH

This is the directory where archived (rotated) log files should be

placed (assuming you’ve chosen to do log rotation).

log_archive_path=/usr/local/nagios/var/archives

LOGGING OPTIONS

If you want messages logged to the syslog facility, as well as the

Nagios log file set this option to 1. If not, set it to 0.

use_syslog=0

NOTIFICATION LOGGING OPTION

If you don’t want notifications to be logged, set this value to 0.

If notifications should be logged, set the value to 1.

log_notifications=1

SERVICE RETRY LOGGING OPTION

If you don’t want service check retries to be logged, set this value

to 0. If retries should be logged, set the value to 1.

log_service_retries=1

HOST RETRY LOGGING OPTION

If you don’t want host check retries to be logged, set this value to

0. If retries should be logged, set the value to 1.

log_host_retries=1

EVENT HANDLER LOGGING OPTION

If you don’t want host and service event handlers to be logged, set

this value to 0. If event handlers should be logged, set the value

to 1.

log_event_handlers=1

INITIAL STATES LOGGING OPTION

If you want Nagios to log all initial host and service states to

the main log file (the first time the service or host is checked)

you can enable this option by setting this value to 1. If you

are not using an external application that does long term state

statistics reporting, you do not need to enable this option. In

this case, set the value to 0.

log_initial_states=0

EXTERNAL COMMANDS LOGGING OPTION

If you don’t want Nagios to log external commands, set this value

to 0. If external commands should be logged, set this value to 1.

Note: This option does not include logging of passive service

checks - see the option below for controlling whether or not

passive checks are logged.

log_external_commands=1

PASSIVE CHECKS LOGGING OPTION

If you don’t want Nagios to log passive host and service checks, set

this value to 0. If passive checks should be logged, set

this value to 1.

log_passive_checks=1

GLOBAL HOST AND SERVICE EVENT HANDLERS

These options allow you to specify a host and service event handler

command that is to be run for every host or service state change.

The global event handler is executed immediately prior to the event

handler that you have optionally specified in each host or

service definition. The command argument is the short name of a

command definition that you define in your host configuration file.

Read the HTML docs for more information.

#global_host_event_handler=somecommand
#global_service_event_handler=somecommand

SERVICE INTER-CHECK DELAY METHOD

This is the method that Nagios should use when initially

“spreading out” service checks when it starts monitoring. The

default is to use smart delay calculation, which will try to

space all service checks out evenly to minimize CPU load.

Using the dumb setting will cause all checks to be scheduled

at the same time (with no delay between them)! This is not a

good thing for production, but is useful when testing the

parallelization functionality.

n = None - don’t use any delay between checks

d = Use a “dumb” delay of 1 second between checks

s = Use “smart” inter-check delay calculation

x.xx = Use an inter-check delay of x.xx seconds

service_inter_check_delay_method=s

MAXIMUM SERVICE CHECK SPREAD

This variable determines the timeframe (in minutes) from the

program start time that an initial check of all services should

be completed. Default is 30 minutes.

max_service_check_spread=4

SERVICE CHECK INTERLEAVE FACTOR

This variable determines how service checks are interleaved.

Interleaving the service checks allows for a more even

distribution of service checks and reduced load on remote

hosts. Setting this value to 1 is equivalent to how versions

of Nagios previous to 0.0.5 did service checks. Set this

value to s (smart) for automatic calculation of the interleave

factor unless you have a specific reason to change it.

s = Use “smart” interleave factor calculation

x = Use an interleave factor of x, where x is a

number greater than or equal to 1.

service_interleave_factor=s

HOST INTER-CHECK DELAY METHOD

This is the method that Nagios should use when initially

“spreading out” host checks when it starts monitoring. The

default is to use smart delay calculation, which will try to

space all host checks out evenly to minimize CPU load.

Using the dumb setting will cause all checks to be scheduled

at the same time (with no delay between them)!

n = None - don’t use any delay between checks

d = Use a “dumb” delay of 1 second between checks

s = Use “smart” inter-check delay calculation

x.xx = Use an inter-check delay of x.xx seconds

host_inter_check_delay_method=s

MAXIMUM HOST CHECK SPREAD

This variable determines the timeframe (in minutes) from the

program start time that an initial check of all hosts should

be completed. Default is 30 minutes.

max_host_check_spread=3

MAXIMUM CONCURRENT SERVICE CHECKS

This option allows you to specify the maximum number of

service checks that can be run in parallel at any given time.

Specifying a value of 1 for this variable essentially prevents

any service checks from being parallelized. A value of 0

will not restrict the number of concurrent checks that are

being executed.

max_concurrent_checks=0

HOST AND SERVICE CHECK REAPER FREQUENCY

This is the frequency (in seconds!) that Nagios will process

the results of host and service checks.

check_result_reaper_frequency=2

MAX CHECK RESULT REAPER TIME

This is the max amount of time (in seconds) that a single

check result reaper event will be allowed to run before

returning control back to Nagios so it can perform other

duties.

max_check_result_reaper_time=2

CHECK RESULT PATH

This is directory where Nagios stores the results of host and

service checks that have not yet been processed.

Note: Make sure that only one instance of Nagios has access

to this directory!

check_result_path=/usr/local/nagios/var/spool/checkresults

MAX CHECK RESULT FILE AGE

This option determines the maximum age (in seconds) which check

result files are considered to be valid. Files older than this

threshold will be mercilessly deleted without further processing.

max_check_result_file_age=3600

CACHED HOST CHECK HORIZON

This option determines the maximum amount of time (in seconds)

that the state of a previous host check is considered current.

Cached host states (from host checks that were performed more

recently that the timeframe specified by this value) can immensely

improve performance in regards to the host check logic.

Too high of a value for this option may result in inaccurate host

states being used by Nagios, while a lower value may result in a

performance hit for host checks. Use a value of 0 to disable host

check caching.

cached_host_check_horizon=10

CACHED SERVICE CHECK HORIZON

This option determines the maximum amount of time (in seconds)

that the state of a previous service check is considered current.

Cached service states (from service checks that were performed more

recently that the timeframe specified by this value) can immensely

improve performance in regards to predictive dependency checks.

Use a value of 0 to disable service check caching.

cached_service_check_horizon=10

ENABLE PREDICTIVE HOST DEPENDENCY CHECKS

This option determines whether or not Nagios will attempt to execute

checks of hosts when it predicts that future dependency logic test

may be needed. These predictive checks can help ensure that your

host dependency logic works well.

Values:

0 = Disable predictive checks

1 = Enable predictive checks (default)

enable_predictive_host_dependency_checks=1

ENABLE PREDICTIVE SERVICE DEPENDENCY CHECKS

This option determines whether or not Nagios will attempt to execute

checks of service when it predicts that future dependency logic test

may be needed. These predictive checks can help ensure that your

service dependency logic works well.

Values:

0 = Disable predictive checks

1 = Enable predictive checks (default)

enable_predictive_service_dependency_checks=1

SOFT STATE DEPENDENCIES

This option determines whether or not Nagios will use soft state

information when checking host and service dependencies. Normally

Nagios will only use the latest hard host or service state when

checking dependencies. If you want it to use the latest state (regardless

of whether its a soft or hard state type), enable this option.

Values:

0 = Don’t use soft state dependencies (default)

1 = Use soft state dependencies

soft_state_dependencies=1

TIME CHANGE ADJUSTMENT THRESHOLDS

These options determine when Nagios will react to detected changes

in system time (either forward or backwards).

#time_change_threshold=900

AUTO-RESCHEDULING OPTION

This option determines whether or not Nagios will attempt to

automatically reschedule active host and service checks to

“smooth” them out over time. This can help balance the load on

the monitoring server.

WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE

PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY

auto_reschedule_checks=0

AUTO-RESCHEDULING INTERVAL

This option determines how often (in seconds) Nagios will

attempt to automatically reschedule checks. This option only

has an effect if the auto_reschedule_checks option is enabled.

Default is 30 seconds.

WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE

PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY

auto_rescheduling_interval=15

AUTO-RESCHEDULING WINDOW

This option determines the “window” of time (in seconds) that

Nagios will look at when automatically rescheduling checks.

Only host and service checks that occur in the next X seconds

(determined by this variable) will be rescheduled. This option

only has an effect if the auto_reschedule_checks option is

enabled. Default is 180 seconds (3 minutes).

WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE

PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY

auto_rescheduling_window=60

SLEEP TIME

This is the number of seconds to sleep between checking for system

events and service checks that need to be run.

sleep_time=0.25

TIMEOUT VALUES

These options control how much time Nagios will allow various

types of commands to execute before killing them off. Options

are available for controlling maximum time allotted for

service checks, host checks, event handlers, notifications, the

ocsp command, and performance data commands. All values are in

seconds.

service_check_timeout=15
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5

RETAIN STATE INFORMATION

This setting determines whether or not Nagios will save state

information for services and hosts before it shuts down. Upon

startup Nagios will reload all saved service and host state

information before starting to monitor. This is useful for

maintaining long-term data on state statistics, etc, but will

slow Nagios down a bit when it (re)starts. Since its only

a one-time penalty, I think its well worth the additional

startup delay.

retain_state_information=1

STATE RETENTION FILE

This is the file that Nagios should use to store host and

service state information before it shuts down. The state

information in this file is also read immediately prior to

starting to monitor the network when Nagios is restarted.

This file is used only if the preserve_state_information

variable is set to 1.

state_retention_file=/usr/local/nagios/var/retention.dat

RETENTION DATA UPDATE INTERVAL

This setting determines how often (in minutes) that Nagios

will automatically save retention data during normal operation.

If you set this value to 0, Nagios will not save retention

data at regular interval, but it will still save retention

data before shutting down or restarting. If you have disabled

state retention, this option has no effect.

retention_update_interval=30

USE RETAINED PROGRAM STATE

This setting determines whether or not Nagios will set

program status variables based on the values saved in the

retention file. If you want to use retained program status

information, set this value to 1. If not, set this value

to 0.

use_retained_program_state=1

USE RETAINED SCHEDULING INFO

This setting determines whether or not Nagios will retain

the scheduling info (next check time) for hosts and services

based on the values saved in the retention file. If you

If you want to use retained scheduling info, set this

value to 1. If not, set this value to 0.

use_retained_scheduling_info=1

RETAINED ATTRIBUTE MASKS (ADVANCED FEATURE)

The following variables are used to specify specific host and

service attributes that should not be retained by Nagios during

program restarts.

The values of the masks are bitwise ANDs of values specified

by the “MODATTR_” definitions found in include/common.h.

For example, if you do not want the current enabled/disabled state

of flap detection and event handlers for hosts to be retained, you

would use a value of 24 for the host attribute mask…

MODATTR_EVENT_HANDLER_ENABLED (8) + MODATTR_FLAP_DETECTION_ENABLED (16) = 24

This mask determines what host attributes are not retained

retained_host_attribute_mask=0

This mask determines what service attributes are not retained

retained_service_attribute_mask=0

These two masks determine what process attributes are not retained.

There are two masks, because some process attributes have host and service

options. For example, you can disable active host checks, but leave active

service checks enabled.

retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0

These two masks determine what contact attributes are not retained.

There are two masks, because some contact attributes have host and

service options. For example, you can disable host notifications for

a contact, but leave service notifications enabled for them.

retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0

INTERVAL LENGTH

This is the seconds per unit interval as used in the

host/contact/service configuration files. Setting this to 60 means

that each interval is one minute long (60 seconds). Other settings

have not been tested much, so your mileage is likely to vary…

interval_length=15

AGGRESSIVE HOST CHECKING OPTION

If you don’t want to turn on aggressive host checking features, set

this value to 0 (the default). Otherwise set this value to 1 to

enable the aggressive check option. Read the docs for more info

on what aggressive host check is or check out the source code in

base/checks.c

use_aggressive_host_checking=1

SERVICE CHECK EXECUTION OPTION

This determines whether or not Nagios will actively execute

service checks when it initially starts. If this option is

disabled, checks are not actively made, but Nagios can still

receive and process passive check results that come in. Unless

you’re implementing redundant hosts or have a special need for

disabling the execution of service checks, leave this enabled!

Values: 1 = enable checks, 0 = disable checks

execute_service_checks=1

PASSIVE SERVICE CHECK ACCEPTANCE OPTION

This determines whether or not Nagios will accept passive

service checks results when it initially (re)starts.

Values: 1 = accept passive checks, 0 = reject passive checks

accept_passive_service_checks=1

HOST CHECK EXECUTION OPTION

This determines whether or not Nagios will actively execute

host checks when it initially starts. If this option is

disabled, checks are not actively made, but Nagios can still

receive and process passive check results that come in. Unless

you’re implementing redundant hosts or have a special need for

disabling the execution of host checks, leave this enabled!

Values: 1 = enable checks, 0 = disable checks

execute_host_checks=1

PASSIVE HOST CHECK ACCEPTANCE OPTION

This determines whether or not Nagios will accept passive

host checks results when it initially (re)starts.

Values: 1 = accept passive checks, 0 = reject passive checks

accept_passive_host_checks=1

NOTIFICATIONS OPTION

This determines whether or not Nagios will sent out any host or

service notifications when it is initially (re)started.

Values: 1 = enable notifications, 0 = disable notifications

enable_notifications=1

EVENT HANDLER USE OPTION

This determines whether or not Nagios will run any host or

service event handlers when it is initially (re)started. Unless

you’re implementing redundant hosts, leave this option enabled.

Values: 1 = enable event handlers, 0 = disable event handlers

enable_event_handlers=1

PROCESS PERFORMANCE DATA OPTION

This determines whether or not Nagios will process performance

data returned from service and host checks. If this option is

enabled, host performance data will be processed using the

host_perfdata_command (defined below) and service performance

data will be processed using the service_perfdata_command (also

defined below). Read the HTML docs for more information on

performance data.

Values: 1 = process performance data, 0 = do not process performance data

process_performance_data=0

HOST AND SERVICE PERFORMANCE DATA PROCESSING COMMANDS

These commands are run after every host and service check is

performed. These commands are executed only if the

enable_performance_data option (above) is set to 1. The command

argument is the short name of a command definition that you

define in your host configuration file. Read the HTML docs for

more information on performance data.

#host_perfdata_command=process-host-perfdata
#service_perfdata_command=process-service-perfdata

HOST AND SERVICE PERFORMANCE DATA FILES

These files are used to store host and service performance data.

Performance data is only written to these files if the

enable_performance_data option (above) is set to 1.

#host_perfdata_file=/tmp/host-perfdata
#service_perfdata_file=/tmp/service-perfdata

HOST AND SERVICE PERFORMANCE DATA FILE TEMPLATES

These options determine what data is written (and how) to the

performance data files. The templates may contain macros, special

characters (\t for tab, \r for carriage return, \n for newline)

and plain text. A newline is automatically added after each write

to the performance data file. Some examples of what you can do are

shown below.

#host_perfdata_file_template=[HOSTPERFDATA]\t$TIMET$\t$HOSTNAME$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$
#service_perfdata_file_template=[SERVICEPERFDATA]\t$TIMET$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$

HOST AND SERVICE PERFORMANCE DATA FILE MODES

This option determines whether or not the host and service

performance data files are opened in write (“w”) or append (“a”)

mode. If you want to use named pipes, you should use the special

pipe (“p”) mode which avoid blocking at startup, otherwise you will

likely want the defult append (“a”) mode.

#host_perfdata_file_mode=a
#service_perfdata_file_mode=a

HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING INTERVAL

These options determine how often (in seconds) the host and service

performance data files are processed using the commands defined

below. A value of 0 indicates the files should not be periodically

processed.

#host_perfdata_file_processing_interval=0
#service_perfdata_file_processing_interval=0

HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING COMMANDS

These commands are used to periodically process the host and

service performance data files. The interval at which the

processing occurs is determined by the options above.

#host_perfdata_file_processing_command=process-host-perfdata-file
#service_perfdata_file_processing_command=process-service-perfdata-file

OBSESS OVER SERVICE CHECKS OPTION

This determines whether or not Nagios will obsess over service

checks and run the ocsp_command defined below. Unless you’re

planning on implementing distributed monitoring, do not enable

this option. Read the HTML docs for more information on

implementing distributed monitoring.

Values: 1 = obsess over services, 0 = do not obsess (default)

obsess_over_services=0

OBSESSIVE COMPULSIVE SERVICE PROCESSOR COMMAND

This is the command that is run for every service check that is

processed by Nagios. This command is executed only if the

obsess_over_services option (above) is set to 1. The command

argument is the short name of a command definition that you

define in your host configuration file. Read the HTML docs for

more information on implementing distributed monitoring.

#ocsp_command=somecommand

OBSESS OVER HOST CHECKS OPTION

This determines whether or not Nagios will obsess over host

checks and run the ochp_command defined below. Unless you’re

planning on implementing distributed monitoring, do not enable

this option. Read the HTML docs for more information on

implementing distributed monitoring.

Values: 1 = obsess over hosts, 0 = do not obsess (default)

obsess_over_hosts=0

OBSESSIVE COMPULSIVE HOST PROCESSOR COMMAND

This is the command that is run for every host check that is

processed by Nagios. This command is executed only if the

obsess_over_hosts option (above) is set to 1. The command

argument is the short name of a command definition that you

define in your host configuration file. Read the HTML docs for

more information on implementing distributed monitoring.

#ochp_command=somecommand

TRANSLATE PASSIVE HOST CHECKS OPTION

This determines whether or not Nagios will translate

DOWN/UNREACHABLE passive host check results into their proper

state for this instance of Nagios. This option is useful

if you have distributed or failover monitoring setup. In

these cases your other Nagios servers probably have a different

“view” of the network, with regards to the parent/child relationship

of hosts. If a distributed monitoring server thinks a host

is DOWN, it may actually be UNREACHABLE from the point of

this Nagios instance. Enabling this option will tell Nagios

to translate any DOWN or UNREACHABLE host states it receives

passively into the correct state from the view of this server.

Values: 1 = perform translation, 0 = do not translate (default)

translate_passive_host_checks=0

PASSIVE HOST CHECKS ARE SOFT OPTION

This determines whether or not Nagios will treat passive host

checks as being HARD or SOFT. By default, a passive host check

result will put a host into a HARD state type. This can be changed

by enabling this option.

Values: 0 = passive checks are HARD, 1 = passive checks are SOFT

passive_host_checks_are_soft=0

ORPHANED HOST/SERVICE CHECK OPTIONS

These options determine whether or not Nagios will periodically

check for orphaned host service checks. Since service checks are

not rescheduled until the results of their previous execution

instance are processed, there exists a possibility that some

checks may never get rescheduled. A similar situation exists for

host checks, although the exact scheduling details differ a bit

from service checks. Orphaned checks seem to be a rare

problem and should not happen under normal circumstances.

If you have problems with service checks never getting

rescheduled, make sure you have orphaned service checks enabled.

Values: 1 = enable checks, 0 = disable checks

check_for_orphaned_services=1
check_for_orphaned_hosts=1

SERVICE FRESHNESS CHECK OPTION

This option determines whether or not Nagios will periodically

check the “freshness” of service results. Enabling this option

is useful for ensuring passive checks are received in a timely

manner.

Values: 1 = enabled freshness checking, 0 = disable freshness checking

check_service_freshness=1

SERVICE FRESHNESS CHECK INTERVAL

This setting determines how often (in seconds) Nagios will

check the “freshness” of service check results. If you have

disabled service freshness checking, this option has no effect.

service_freshness_check_interval=60

HOST FRESHNESS CHECK OPTION

This option determines whether or not Nagios will periodically

check the “freshness” of host results. Enabling this option

is useful for ensuring passive checks are received in a timely

manner.

Values: 1 = enabled freshness checking, 0 = disable freshness checking

check_host_freshness=0

HOST FRESHNESS CHECK INTERVAL

This setting determines how often (in seconds) Nagios will

check the “freshness” of host check results. If you have

disabled host freshness checking, this option has no effect.

host_freshness_check_interval=60

ADDITIONAL FRESHNESS THRESHOLD LATENCY

This setting determines the number of seconds that Nagios

will add to any host and service freshness thresholds that

it calculates (those not explicitly specified by the user).

additional_freshness_latency=15

FLAP DETECTION OPTION

This option determines whether or not Nagios will try

and detect hosts and services that are “flapping”.

Flapping occurs when a host or service changes between

states too frequently. When Nagios detects that a

host or service is flapping, it will temporarily suppress

notifications for that host/service until it stops

flapping. Flap detection is very experimental, so read

the HTML documentation before enabling this feature!

Values: 1 = enable flap detection

0 = disable flap detection (default)

enable_flap_detection=1

FLAP DETECTION THRESHOLDS FOR HOSTS AND SERVICES

Read the HTML documentation on flap detection for

an explanation of what this option does. This option

has no effect if flap detection is disabled.

low_service_flap_threshold=25.0
high_service_flap_threshold=50.0
low_host_flap_threshold=25.0
high_host_flap_threshold=50.0

DATE FORMAT OPTION

This option determines how short dates are displayed. Valid options

include:

us (MM-DD-YYYY HH:MM:SS)

euro (DD-MM-YYYY HH:MM:SS)

iso8601 (YYYY-MM-DD HH:MM:SS)

strict-iso8601 (YYYY-MM-DDTHH:MM:SS)

date_format=euro

TIMEZONE OFFSET

This option is used to override the default timezone that this

instance of Nagios runs in. If not specified, Nagios will use

the system configured timezone.

NOTE: In order to display the correct timezone in the CGIs, you

will also need to alter the Apache directives for the CGI path

to include your timezone. Example:

<Directory “/usr/local/nagios/sbin/”>

SetEnv TZ “Australia/Brisbane”

#use_timezone=US/Mountain
#use_timezone=Australia/Brisbane

P1.PL FILE LOCATION

This value determines where the p1.pl perl script (used by the

embedded Perl interpreter) is located. If you didn’t compile

Nagios with embedded Perl support, this option has no effect.

p1_file=/usr/local/nagios/bin/p1.pl

EMBEDDED PERL INTERPRETER OPTION

This option determines whether or not the embedded Perl interpreter

will be enabled during runtime. This option has no effect if Nagios

has not been compiled with support for embedded Perl.

Values: 0 = disable interpreter, 1 = enable interpreter

enable_embedded_perl=1

EMBEDDED PERL USAGE OPTION

This option determines whether or not Nagios will process Perl plugins

and scripts with the embedded Perl interpreter if the plugins/scripts

do not explicitly indicate whether or not it is okay to do so. Read

the HTML documentation on the embedded Perl interpreter for more

information on how this option works.

use_embedded_perl_implicitly=1

ILLEGAL OBJECT NAME CHARACTERS

This option allows you to specify illegal characters that cannot

be used in host names, service descriptions, or names of other

object types.

illegal_object_name_chars=`~!$%^&*|’"<>?,()=

ILLEGAL MACRO OUTPUT CHARACTERS

This option allows you to specify illegal characters that are

stripped from macros before being used in notifications, event

handlers, etc. This DOES NOT affect macros used in service or

host check commands.

The following macros are stripped of the characters you specify:

$HOSTOUTPUT$

$HOSTPERFDATA$

$HOSTACKAUTHOR$

$HOSTACKCOMMENT$

$SERVICEOUTPUT$

$SERVICEPERFDATA$

$SERVICEACKAUTHOR$

$SERVICEACKCOMMENT$

illegal_macro_output_chars=`~$&|’"<>

REGULAR EXPRESSION MATCHING

This option controls whether or not regular expression matching

takes place in the object config files. Regular expression

matching is used to match host, hostgroup, service, and service

group names/descriptions in some fields of various object types.

Values: 1 = enable regexp matching, 0 = disable regexp matching

use_regexp_matching=0

“TRUE” REGULAR EXPRESSION MATCHING

This option controls whether or not “true” regular expression

matching takes place in the object config files. This option

only has an effect if regular expression matching is enabled

(see above). If this option is DISABLED, regular expression

matching only occurs if a string contains wildcard characters

(* and ?). If the option is ENABLED, regexp matching occurs

all the time (which can be annoying).

Values: 1 = enable true matching, 0 = disable true matching

use_true_regexp_matching=0

ADMINISTRATOR EMAIL/PAGER ADDRESSES

The email and pager address of a global administrator (likely you).

Nagios never uses these values itself, but you can access them by

using the $ADMINEMAIL$ and $ADMINPAGER$ macros in your notification

commands.

[email protected]
admin_pager=pagenagios@localhost

DAEMON CORE DUMP OPTION

This option determines whether or not Nagios is allowed to create

a core dump when it runs as a daemon. Note that it is generally

considered bad form to allow this, but it may be useful for

debugging purposes. Enabling this option doesn’t guarantee that

a core file will be produced, but that’s just life…

Values: 1 - Allow core dumps

0 - Do not allow core dumps (default)

daemon_dumps_core=0

LARGE INSTALLATION TWEAKS OPTION

This option determines whether or not Nagios will take some shortcuts

which can save on memory and CPU usage in large Nagios installations.

Read the documentation for more information on the benefits/tradeoffs

of enabling this option.

Values: 1 - Enabled tweaks

0 - Disable tweaks (default)

use_large_installation_tweaks=0

ENABLE ENVIRONMENT MACROS

This option determines whether or not Nagios will make all standard

macros available as environment variables when host/service checks

and system commands (event handlers, notifications, etc.) are

executed. Enabling this option can cause performance issues in

large installations, as it will consume a bit more memory and (more

importantly) consume more CPU.

Values: 1 - Enable environment variable macros (default)

0 - Disable environment variable macros

enable_environment_macros=1

CHILD PROCESS MEMORY OPTION

This option determines whether or not Nagios will free memory in

child processes (processed used to execute system commands and host/

service checks). If you specify a value here, it will override

program defaults.

Value: 1 - Free memory in child processes

0 - Do not free memory in child processes

#free_child_process_memory=1

CHILD PROCESS FORKING BEHAVIOR

This option determines how Nagios will fork child processes

(used to execute system commands and host/service checks). Normally

child processes are fork()ed twice, which provides a very high level

of isolation from problems. Fork()ing once is probably enough and will

save a great deal on CPU usage (in large installs), so you might

want to consider using this. If you specify a value here, it will

program defaults.

Value: 1 - Child processes fork() twice

0 - Child processes fork() just once

#child_processes_fork_twice=1

DEBUG LEVEL

This option determines how much (if any) debugging information will

be written to the debug file. OR values together to log multiple

types of information.

Values:

-1 = Everything

0 = Nothing

1 = Functions

2 = Configuration

4 = Process information

8 = Scheduled events

16 = Host/service checks

32 = Notifications

64 = Event broker

128 = External commands

256 = Commands

512 = Scheduled downtime

1024 = Comments

2048 = Macros

debug_level=0

DEBUG VERBOSITY

This option determines how verbose the debug log out will be.

Values: 0 = Brief output

1 = More detailed

2 = Very detailed

debug_verbosity=1

DEBUG FILE

This option determines where Nagios should write debugging information.

debug_file=/usr/local/nagios/var/nagios.debug

MAX DEBUG FILE SIZE

This option determines the maximum size (in bytes) of the debug file. If

the file grows larger than this size, it will be renamed with a .old

extension. If a file already exists with a .old extension it will

automatically be deleted. This helps ensure your disk space usage doesn’t

get out of control when debugging Nagios.

max_debug_file_size=1000000**

Any ideas?

having never tested the performance of nagios before and after installing pnp or nagiosgraph i can’t say… the only thing which looks quite strange is the fact that you have so many active checks… it is not recommended to do host active checks on a scheduled basis… but that’s a different kind of problem…

if you have the chance to do it, simply backup everything and try to switch back to the old cfg without PNP… and see what happens… maybe it’s PNP hitting hard :slight_smile:

Luca

Hi Luca,

Thanks for the reply. Yeah, PNP we still havent got working. At this stage when the performance hit happened all that had changed was more HDD space and more RAM had been allocated to Nagios. We started it back up and BAM, performance hit. PNP was still far off at this point.

We have returned the memory back to what it was just for testings sake, didnt fix it as was always unlikely to so we’re a little stuck as it doesnt make sense what could have hit our performance so badly.

Are you explicitly asking for active host checks to be made? they shouldn’t be scheduled… possibly something got FIXED in your setup and that’s why you are seeing less active checks. OTOH the raise in check latency doesn’t make much sense in this view… :confused: Which brings to the opposite reasoning… a host is down and is getting checked with a high latency (timeout of some sort) and is thus reducing the total number of possible checks… (active host checks are blocking in nagios)

OK Luca,

I see what you’re saying about host checks and I’ll look into this as a further performance booster when we get it back to the way it was before - as this was something that as can be seen in the graphs was running fine with before. So it can’t be this.

The second part is too much of a coincidence - right up from reboot the performance hit happened. Something during that time frame has “happened” which has crippled our Nagios - it just doesnt make much sense.

Thanks for the help Luca - any more ideas on what could be happening?

do you have any cpu load information before and after the config change?

i work with SiB. we have some CACTI graphs of NAGIOS CPU which SiB can upload. (please note that if we stop NAGIOS daemon our cpu usage is ~0% - so its definately engine related - why would it be working harder AFTER a reboot?)

Our original theory was that it was some sort of cache. We found out that by clearing downt he status.dat / objects / retention files from /usr/local/nagios/var/ our check latency has gone from 60-90seconds to 16-26seconds. (much improved but not like it was) **EDIT: 18/02/10@14:34 - it has gone back to ~72second latency after a couple of hours of clear down :? **

I did think it was some down’ed host that was causing the time-out but we can see there are no hosts unavailable. SiB can you post a screenshot of the “Tactical Overview” section to show what we have enabled/disabled/active/passive etc…

We have a full set of MRTG nagiostat graphs and also performance so if you need any please request and myself or SiB will upload.

Thanks for your help.

Tactical Overview attached:


CPU:


You can see where we took it down on Saturday. Results start again on the 16th. Not sure why the three day gap instead of a few hours, perhaps sneakymonkey can shed some light on this.

Regardless, you can see the difference in CPU utilisation. And a more broad view:

CPU Broad:

Massive change!

./rc.snmpd start

its a bug i never fixed. snmpd doesn’t start up automatically… (another problem/another day) so therefore CACTI couldn’t poll the data from NAGIOS. ANYWAY. :smiley:

[quote=“sneakymonkey”]./rc.snmpd start

its a bug i never fixed. snmpd doesn’t start up automatically… (another problem/another day) so therefore CACTI couldn’t poll the data from NAGIOS. ANYWAY. :D[/quote]

which means? everything fixed starting snmpd?

the gaps in the graphs that spanned over 3 days… that was because i didn’t enable SNMPD on nagios so that cacti could poll the information. therefore leaving gaps. the problem with the check latency on nagios is still an issue. sorry for the confusion! :stuck_out_tongue:

aah ok, makes sense :smiley:

what is top saying? which processes exactly are using up the cpu… or possibly making the system wait?

Hi Luca,

Nothing really out of the ordinary in top: