Service not updaing ontime

Mudasar · February 9, 2010, 10:10pm

I setup a new centreon/nagios added more then 150 hosts and 1500 service. But the service/hosts update its status very late.
for example i have added java process to monitor, it tell that next schedule check is after 3 mint but it take some time more then 2 hours.

Java PENDING N/A 0d 0h 34m 2s+ 1/3

I have tune nagios and increase Maximum Concurrent Service Checks to 2000, and disable passive checks.

As per advice in following thread, i have disable “Agressive Host Checks” and enable “Parallel Check” of services but still no luck

forum.centreon.com/archive/index.php/t-2959.html

Thanks
MY

Mudasar · February 10, 2010, 4:48am

Some body has any advice.

luca · February 10, 2010, 11:10am

150/1500 can be a lot.
I don’t know how centreon gets into the equationm, but you could simpy have too many checks for your machine (it shouldn’t be the case if you are not running all checks every minute but there’s not enough info). How is the load on the machine?

Mudasar · February 10, 2010, 11:51pm

HI,

Thanks for your reply, Centreon is just a front end configuration tool, behind it completely use nagios configuration. We are monitor the same number of host/service on less power full machine.

Yes i have start/stop multiple time, following is the nagios.cfg, that might give to better idea.

cfg_file=/usr/local/nagios/etc/hostTemplates.cfg
cfg_file=/usr/local/nagios/etc/hosts.cfg
cfg_file=/usr/local/nagios/etc/serviceTemplates.cfg
cfg_file=/usr/local/nagios/etc/services.cfg
cfg_file=/usr/local/nagios/etc/misccommands.cfg
cfg_file=/usr/local/nagios/etc/checkcommands.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/escalations.cfg
cfg_file=/usr/local/nagios/etc/dependencies.cfg
cfg_file=/usr/local/nagios/etc/meta_commands.cfg
cfg_file=/usr/local/nagios/etc/meta_contactgroup.cfg
cfg_file=/usr/local/nagios/etc/meta_contact.cfg
cfg_file=/usr/local/nagios/etc/meta_dependencies.cfg
cfg_file=/usr/local/nagios/etc/meta_escalations.cfg
cfg_file=/usr/local/nagios/etc/meta_hostgroup.cfg
cfg_file=/usr/local/nagios/etc/meta_host.cfg
cfg_file=/usr/local/nagios/etc/meta_services.cfg
cfg_file=/usr/local/nagios/etc/meta_timeperiod.cfg
resource_file=/usr/local/nagios/etc//resource.cfg
log_file=/usr/local/nagios/var/nagios.log
temp_file=/usr/local/nagios/var/nagios.tmp
status_file=/usr/local/nagios/var/status.log
p1_file=/usr/local/nagios/bin/p1.pl
status_update_interval=15
nagios_user=nagios
nagios_group=nagios
enable_notifications=1
execute_service_checks=1
accept_passive_service_checks=1
enable_event_handlers=1
log_rotation_method=d
log_archive_path=/usr/local/nagios/var/archives/
check_external_commands=1
command_check_interval=1s
command_file=/usr/local/nagios/var/rw/nagios.cmd
lock_file=/usr/local/nagios/var/nagios.lock
retain_state_information=1
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=0
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=1
log_external_commands=1
sleep_time=1
service_inter_check_delay_method=s
service_interleave_factor=s
max_concurrent_checks=2000
check_result_reaper_frequency=5
interval_length=60
use_agressive_host_checking=0
enable_flap_detection=0
low_service_flap_threshold=25.0
high_service_flap_threshold=50.0
low_host_flap_threshold=25.0
high_host_flap_threshold=50.0
soft_state_dependencies=0
service_check_timeout=60
host_check_timeout=10
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
ochp_timeout=5
perfdata_timeout=5
obsess_over_services=0
process_performance_data=1
service_perfdata_command=process-service-perfdata
host_perfdata_file_mode=2
service_perfdata_file_mode=2
check_for_orphaned_services=0
check_service_freshness=0
date_format=euro
illegal_object_name_chars=~!$%^&*"|’<>?,()=
illegal_macro_output_chars=`~$^&"|’<>
admin_email=admin
admin_pager=admin@localhost
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
event_broker_options=-1
cached_host_check_horizon=60
cached_service_check_horizon=60
use_large_installation_tweaks=1

Thanks
MY

mmestnik · February 11, 2010, 1:09am

I’ve not a lot of experience in this area, but I thought you might want to adjust these:

command_check_interval=2 # (120s, 2 minuets)
retention_update_interval, every hour. May be two long.
check_result_reaper_frequency=1
service_check_timeout if you can lower this. After 15 seconds things are un-bearably slow. During times of dealing with a stressful service high values for this are nice.

I’d say it’s check_result_reaper_frequency that’s killing you, this is 2000 checks every 5 seconds… Less if checks take longer then 5 seconds. 1 is even too large of a number, but I don’t know if floats are acceptable here. 300ms(just over 3 times a second)

Mudasar · February 11, 2010, 4:25am

I have done the same changes as you said but no luck.

luca · February 11, 2010, 7:33am

How is the load on the machine?
is it in wait or idle?

Mudasar · February 12, 2010, 3:02am

Hi,

Following is the nagios -s check, please let me know if it has any issue

/usr/local/nagios/bin/nagios -s /usr/local/nagios/etc/nagios.cfg

Nagios Core 3.2.0
Copyright © 2009 Nagios Core Development Team and Community Contributors
Copyright © 1999-2009 Ethan Galstad
Last Modified: 08-12-2009
License: GPL

Website: nagios.org
Timing information on object configuration processing is listed
below. You can use this information to see if precaching your
object configuration would be useful.

Object Config Source: Config files (uncached)

OBJECT CONFIG PROCESSING TIMES (* = Potential for precache savings with -u option)

Read: 0.011252 sec
Resolve: 0.001085 sec *
Recomb Contactgroups: 0.000101 sec *
Recomb Hostgroups: 0.000201 sec *
Dup Services: 0.002897 sec *
Recomb Servicegroups: 0.000467 sec *
Duplicate: 0.000004 sec *
Inherit: 0.000470 sec *
Recomb Contacts: 0.000002 sec *
Sort: 0.000002 sec *
Register: 0.004701 sec
Free: 0.000852 sec
============
TOTAL: 0.022036 sec * = 0.005231 sec (23.74%) estimated savings

RETENTION DATA TIMES

Read and Process: 0.041698 sec
============
TOTAL: 0.041698 sec

Timing information on configuration verification is listed below.

CONFIG VERIFICATION TIMES (* = Potential for speedup with -x option)

Object Relationships: 0.002996 sec
Circular Paths: 0.000006 sec *
Misc: 0.000652 sec
============
TOTAL: 0.003654 sec * = 0.000006 sec (0.2%) estimated savings

EVENT SCHEDULING TIMES

Get service info: 0.002991 sec
Get host info info: 0.000372 sec
Get service params: 0.000011 sec
Schedule service times: 0.006283 sec
Schedule service events: 0.001292 sec
Get host params: 0.000002 sec
Schedule host times: 0.000772 sec
Schedule host events: 0.000675 sec
============
TOTAL: 0.012398 sec

Projected scheduling information for host and service checks
is listed below. This information assumes that you are going
to start running Nagios with your current config files.

HOST SCHEDULING INFORMATION

Total hosts: 133
Total scheduled hosts: 132
Host inter-check delay method: SMART
Average host check interval: 116.36 sec
Host inter-check delay: 0.88 sec
Max host check spread: 30 min
First scheduled check: Thu Feb 11 18:58:36 2010
Last scheduled check: Thu Feb 11 19:00:31 2010

SERVICE SCHEDULING INFORMATION

Total services: 1071
Total scheduled services: 1070
Service inter-check delay method: SMART
Average service check interval: 185.05 sec
Inter-check delay: 0.17 sec
Interleave factor method: SMART
Average services per host: 8.05
Service interleave factor: 9
Max service check spread: 30 min
First scheduled check: Thu Feb 11 18:58:56 2010
Last scheduled check: Thu Feb 11 19:02:01 2010

CHECK PROCESSING INFORMATION

Check result reaper interval: 1 sec
Max concurrent service checks: Unlimited

PERFORMANCE SUGGESTIONS

Thanks
MY

mmestnik · February 13, 2010, 12:37am

The output of “top -n 1” a few times would answer luca’s question.

Mudasar · February 13, 2010, 4:00am

Hi,

./nagiostats

CURRENT STATUS DATA

Status File: /usr/local/nagios/var/status.log
Status File Age: 0d 0h 0m 15s
Status File Version: 3.2.0

Program Running Time: 0d 0h 10m 55s
Nagios PID: 496
Used/High/Total Command Buffers: 0 / 0 / 4096

Total Services: 999
Services Checked: 999
Services Scheduled: 998
Services Actively Checked: 999
Services Passively Checked: 0
Total Service State Change: 0.000 / 33.090 / 0.348 %
Active Service Latency: 0.064 / 1009.019 / 264.398 sec
Active Service Execution Time: 0.013 / 25.305 / 2.098 sec
Active Service State Change: 0.000 / 33.090 / 0.348 %
Active Services Last 1/5/15/60 min: 27 / 316 / 838 / 998
Passive Service Latency: 0.000 / 0.000 / 0.000 sec
Passive Service State Change: 0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit: 975 / 10 / 3 / 11
Services Flapping: 0
Services In Downtime: 0

Total Hosts: 133
Hosts Checked: 133
Hosts Scheduled: 132
Hosts Actively Checked: 133
Host Passively Checked: 0
Total Host State Change: 0.000 / 0.000 / 0.000 %
Active Host Latency: 0.000 / 283.449 / 153.985 sec
Active Host Execution Time: 2.055 / 10.036 / 2.916 sec
Active Host State Change: 0.000 / 0.000 / 0.000 %
Active Hosts Last 1/5/15/60 min: 0 / 20 / 132 / 132
Passive Host Latency: 0.000 / 0.000 / 0.000 sec
Passive Host State Change: 0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
Hosts Up/Down/Unreach: 129 / 4 / 0
Hosts Flapping: 0
Hosts In Downtime: 0

Active Host Checks Last 1/5/15 min: 1 / 37 / 215
Scheduled: 0 / 18 / 166
On-demand: 1 / 19 / 49
Parallel: 1 / 26 / 180
Serial: 0 / 0 / 0
Cached: 0 / 11 / 35
Passive Host Checks Last 1/5/15 min: 0 / 0 / 0
Active Service Checks Last 1/5/15 min: 43 / 373 / 854
Scheduled: 43 / 373 / 854
On-demand: 0 / 0 / 0
Cached: 0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 0 / 0

External Commands Last 1/5/15 min: 0 / 0 / 0

Thanks
My

luca · February 13, 2010, 12:34pm

i give up, sorry :-/

mmestnik · February 13, 2010, 8:38pm

luca,
The last post from Mudasar seamed to indicate some form of mild successes. I’d like to see what the status is after a few hors or a day.

Mudasar,
What do you think was the single most contributing factor to this success? Where there any secondary factors that you feel others with your problem might need to look at?

luca · February 13, 2010, 8:45pm

I am aware there might be a form of success. but asking for server load and getting three times an unconsistent answer doesn’t really cheer you up

And when it happens day after day after day you soemtimes get a bit demotivated

Mudasar · February 16, 2010, 9:17pm

Hi,

Thanks for taking your time and reply. I will not demotivate :lol: , i think the issue has been resolved. After adding lot of hosts and service i enable notification. But the local smtp relay is working properly. After reconfigure the local snmtp every thing looks coooooooooooool.

And one more thing, i also facing graphs issue that has been also resolved.

Once again for helping.

Thanks
MY