Hi there,
I’m new here, but I did the search and didn’t find any question like my one.
We’re using Nagios for monitoring our server farm, it consists of around 1500 hosts with more then5000 different checks (NRPE, SNMP as well).
the problem that we have in the Check Latency, in avarage it’s more then 150-200 seconds :(.
The number of concurent check is set to “0” (unlimited),
nagios -s nagios.cfg shows
HOST SCHEDULING INFORMATION
Total hosts: 836
Total scheduled hosts: 0
Host inter-check delay method: SMART
Average host check interval: 0.00 sec
Host inter-check delay: 0.00 sec
Max host check spread: 15 min
First scheduled check: N/A
Last scheduled check: N/A
SERVICE SCHEDULING INFORMATION
Total services: 5451
Total scheduled services: 5447
Service inter-check delay method: SMART
Average service check interval: 775.09 sec
Inter-check delay: 0.14 sec
Interleave factor method: SMART
Average services per host: 6.52
Service interleave factor: 7
Max service check spread: 15 min
First scheduled check: Mon Nov 14 11:19:59 2005
Last scheduled check: Mon Nov 14 11:32:54 2005
CHECK PROCESSING INFORMATION
Service check reaper interval: 2 sec
Max concurrent service checks: Unlimited
PERFORMANCE SUGGESTIONS
I have no suggestions - things look okay.
Total hosts: 836
Total scheduled hosts: 0
Host inter-check delay method: SMART
Average host check interval: 0.00 sec
Host inter-check delay: 0.00 sec
Max host check spread: 15 min
First scheduled check: N/A
Last scheduled check: N/A
SERVICE SCHEDULING INFORMATION
Total services: 5451
Total scheduled services: 5447
Service inter-check delay method: SMART
Average service check interval: 775.09 sec
Inter-check delay: 0.14 sec
Interleave factor method: SMART
Average services per host: 6.52
Service interleave factor: 7
Max service check spread: 15 min
First scheduled check: Mon Nov 14 11:19:59 2005
Last scheduled check: Mon Nov 14 11:32:54 2005
CHECK PROCESSING INFORMATION
Service check reaper interval: 2 sec
Max concurrent service checks: Unlimited
PERFORMANCE SUGGESTIONS
I have no suggestions - things look okay.
if I reduce te number of checks to 4000 (I tried to remove diferent service groups) then latency turned in 20-30 seconds.
Where could be the problem and where should I turn?
The CPU usage on our dual-Xeon (2,4 Gh) nagios server is not very high.