I recently upgraded our nagios monitoring system for an early 2.something version to the most recent stable release 3.2.3, because we had huge check latency.
While the performance improved dramatically we still have check latency of approx 3 minutes that I would rather not have.
I am monitoring 228 hosts and 4699 services, I turned on large installation tweaks, changed from ping to fping, and made the recommended adjustments to the “reaper” settings and status file updates.
The run_queue / load average on the box is 2-4, I added two more virtual cpu’s and it didn’t make much difference to that figure or the check latency.
For HA /usr/local/nagios/etc and /usr/local/nagios/var are links to an nfs mount, I do not see any I/O wait but perhaps I need to move some stuff to memory or local (iscsi) mounts?
Is is possible to perform this many active checks or do I need to look at a federated set-up with slaves active monitoring and submitting results to a passive master?