False alerts


#1

I’m new to Nagios. At first, I was absolutely blown away by the excellent quality of the system, but in practice it’s been unreliable. My enthusiasm is waning.

The problem is, I frequently get false alerts, even from the basic PING service. And it seems that occasionally when a host/service goes into a CRITICAL state it won’t come out, even if the problem has cleared. Other hosts using the same PING command continue working fine. When a host/service goes into a CRITICAL state it’s problem status line says:

CRITICAL - Plugin timed out after 10 seconds.

At first I noticed that Nagios had stopped executing plugins for the affected hosts. I found a reference in the documentation that advised turning on the “check_for_orphaned_services” feature in nagios.cfg. I did that. Now it seems the plugins are being rescheduled and executed but the CRITICAL state still appears in error and fails to clear. Once in a CRITICAL state hosts sometimes stay there until I restart the Nagios process.

It’s likely a problem my own doing, such as improper configuration, instead of by Nagios. But I can’t find it. Can someone advise me what I’m doing wrong. I’ve read a lot of the documentation (not all) and searched the mail/forums but I can’t figure it out. I appreciate any help.

I’m running Nagios 2.0b3 on Centos 4 (Redhat) Linux. Would I have better luck with a non-beta version of Nagios.

thanks


#2

I think I’ve found the solution to my problem. The ping command itself was taking longer than I assumed. Running check_ping from the command line exposed that check_ping was taking over 10 seconds to complete. I then used “check_ping -h” to get help on the use of check_ping. I found a parameter (-t ) to increase the timeout. It used “-t 15” to increase the timeout from the default 10 seconds to 15 seconds. This appears to have fixed the problem.


#3

you should also use the check_fping plugin instead.