Nagios performance configuration

I would like to ensure that my nagios installation checks all of my services as quickly as possible to ensure an accurate state of our network and servers within nagios. And to ensure accurate reporting.

I have shortened the max_service_check_spread value and turned off passive check values as we dont use these. Is there any other way to get the quickest check times of all services and to ensure quick recovery detection?

What is your avg. check latency and avg execution time now?

Total Services: 532
Services Checked: 532
Services Scheduled: 505
Active Service Checks: 532
Passive Service Checks: 0
Total Service State Change: 0.000 / 15.070 / 0.329 %
Active Service Latency: 0.000 / 24.049 / 0.574 %
Active Service Execution Time: 0.017 / 15.135 / 0.435 sec
Active Service State Change: 0.000 / 15.070 / 0.329 %
Active Services Last 1/5/15/60 min: 27 / 145 / 305 / 468
Passive Service State Change: 0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit: 530 / 2 / 0 / 0

I asked for averages and from the above I can’t tell.
But anyway, no matter if the above are the max or averages, either one is too big.
Fix your problem of how long your checks take first thing. Most likely, you have some poor settings for timeouts or something. Or perhaps your check_ping plugin is using something like -p 10 or maybe 5, when it should be -p 1.
See the nagios docs on how to trim things up.
nagios.sourceforge.net/docs/2_0/tuning.html

Plus, why are your services changing state so much? Just what are you monitoring, client pc’s that are powered up/down often? If so, why bother?

The output is in the format of:

min / max /average

This is my problem, I have gone through all of the tuning documentation but it still seems too slow to complete all checks. My check_ping is “-p 1” as it should be.

No only servers we just have a busy network with lots of activity.

with regards to tuning nagios I have set the value

max_concurrent_checks=0

under the impression this will allow nagios to perform the checks as fast as the server allows, is this correct or should I manually calculate this value?

also i get this when running the nagios -s

PERFORMANCE SUGGESTIONS

I have no suggestions - things look okay.

Active Service Latency: 0.000 / 24.049 / 0.574
That tells me that nagios is not slow. If the average latency of a check is .574 seconds, then that means that all checks are being completed within .574 seconds of the time that it was scheduled to be ran. So why do you say nagios isn’t completing all the checks when it looks like it is from the above info?

max_concurrent_checks=0 that is correct.

it just seems slow to recover a lot of the time. Maybe it is just me being inpatient.

However one thing that is extremely slow to recover is windows, I use nrpe_nt - are you aware of any issues with this?

I get the feeling I should be using ncsa to submit passive checks…

Exactly. Running active checks to remote hosts is a waste of time. You have to login, run the check, wait for results, pour some coffee, …