I’m new to Nagios. At first, I was absolutely blown away by the excellent quality of the system, but in practice it’s been unreliable. My enthusiasm is waning.
The problem is, I frequently get false alerts, even from the basic PING service. And it seems that occasionally when a host/service goes into a CRITICAL state it won’t come out, even if the problem has cleared. Other hosts using the same PING command continue working fine. When a host/service goes into a CRITICAL state it’s problem status line says:
CRITICAL - Plugin timed out after 10 seconds.
At first I noticed that Nagios had stopped executing plugins for the affected hosts. I found a reference in the documentation that advised turning on the “check_for_orphaned_services” feature in nagios.cfg. I did that. Now it seems the plugins are being rescheduled and executed but the CRITICAL state still appears in error and fails to clear. Once in a CRITICAL state hosts sometimes stay there until I restart the Nagios process.
It’s likely a problem my own doing, such as improper configuration, instead of by Nagios. But I can’t find it. Can someone advise me what I’m doing wrong. I’ve read a lot of the documentation (not all) and searched the mail/forums but I can’t figure it out. I appreciate any help.
I’m running Nagios 2.0b3 on Centos 4 (Redhat) Linux. Would I have better luck with a non-beta version of Nagios.