Nagios performance configuration

lgill · October 23, 2006, 10:11am

I would like to ensure that my nagios installation checks all of my services as quickly as possible to ensure an accurate state of our network and servers within nagios. And to ensure accurate reporting.

I have shortened the max_service_check_spread value and turned off passive check values as we dont use these. Is there any other way to get the quickest check times of all services and to ensure quick recovery detection?

jakkedup · October 23, 2006, 11:11am

What is your avg. check latency and avg execution time now?

lgill · October 23, 2006, 11:20am

Total Services: 532
Services Checked: 532
Services Scheduled: 505
Active Service Checks: 532
Passive Service Checks: 0
Total Service State Change: 0.000 / 15.070 / 0.329 %
Active Service Latency: 0.000 / 24.049 / 0.574 %
Active Service Execution Time: 0.017 / 15.135 / 0.435 sec
Active Service State Change: 0.000 / 15.070 / 0.329 %
Active Services Last 1/5/15/60 min: 27 / 145 / 305 / 468
Passive Service State Change: 0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit: 530 / 2 / 0 / 0

jakkedup · October 23, 2006, 3:58pm

I asked for averages and from the above I can’t tell.
But anyway, no matter if the above are the max or averages, either one is too big.
Fix your problem of how long your checks take first thing. Most likely, you have some poor settings for timeouts or something. Or perhaps your check_ping plugin is using something like -p 10 or maybe 5, when it should be -p 1.
See the nagios docs on how to trim things up.
nagios.sourceforge.net/docs/2_0/tuning.html

jakkedup · October 23, 2006, 4:01pm

Plus, why are your services changing state so much? Just what are you monitoring, client pc’s that are powered up/down often? If so, why bother?

lgill · October 23, 2006, 4:16pm

The output is in the format of:

min / max /average

This is my problem, I have gone through all of the tuning documentation but it still seems too slow to complete all checks. My check_ping is “-p 1” as it should be.

No only servers we just have a busy network with lots of activity.

lgill · October 23, 2006, 4:22pm

with regards to tuning nagios I have set the value

max_concurrent_checks=0

under the impression this will allow nagios to perform the checks as fast as the server allows, is this correct or should I manually calculate this value?

lgill · October 23, 2006, 4:24pm

also i get this when running the nagios -s

PERFORMANCE SUGGESTIONS

I have no suggestions - things look okay.

jakkedup · October 23, 2006, 6:08pm

Active Service Latency: 0.000 / 24.049 / 0.574
That tells me that nagios is not slow. If the average latency of a check is .574 seconds, then that means that all checks are being completed within .574 seconds of the time that it was scheduled to be ran. So why do you say nagios isn’t completing all the checks when it looks like it is from the above info?

max_concurrent_checks=0 that is correct.

lgill · October 24, 2006, 8:02am

it just seems slow to recover a lot of the time. Maybe it is just me being inpatient.

However one thing that is extremely slow to recover is windows, I use nrpe_nt - are you aware of any issues with this?

I get the feeling I should be using ncsa to submit passive checks…

jakkedup · October 26, 2006, 3:44pm

Exactly. Running active checks to remote hosts is a waste of time. You have to login, run the check, wait for results, pour some coffee, …