I’m wondering how to tell when its time to implement distributed monitoring in Nagios or if there are certain thresholds to use as a guideline. Im running nagios 3.2 on sles 11. We are currently monitoring 152 hosts and 1250 services, which seems to be growing everyday. The server is constantly under 40%++ CPU utlization (very frequently it sits above 80+%) and a load of 3 to 10, which varies. The system has dual Intel Xeon 3 Ghz processors and 4gb memory. In the performance information my check latency is about 0.75 second.
Im asking because we are getting some checks that time-out and throw an alert. Its annoying getting the false positives and then the recovery alerts from those services as well.
Any information or help would be appreciated!