I’ve setup a few Nagios boxes but this one really takes the cake. Its a Fedora Core 4 Box running apache 2.0.55 and Nagios 1.3.
It has been running fine for a few days then all of a sudden it froze up, even after a reboot it wont start checking hosts, it just sits there doing nothing.
I’m really at a loss with this, I had a nagios 1.2 box that has been up and running for about a year with no problems but this one wont work. I’ve tried version 2.0 but this just wont work at all. Im wondering if it is to do with FC4? As before I used FC3.
Hm…rather interesting…how do you know that Nagios isn’t checking anything? Looks to me that, from the output of your process grep, Nagios is checking to see that a host is up…
I’m pretty sure that the issue isn’t with FC4…it might be with the way it was installed, but I’ve got FC3 and FC4 running, both of them with Nagios on it, and things are running smoothly.
Now it only ping 5 times rather than 15, so its not waiting for 15 icmp packets, this worked fine on the old config but on here didnt.
Its running okay now, but not sure if this is what fixed it or not. Its completing all 313 checks in 5 minutes or less which is good, before it wasnt even doing half that in an hour!
Could someone post their checkcomands.cfg file for me to have a look at check_ping
and check_host_alive directives to see what you have used for them?
It think this might be where the problem lies.
you have a warning at 80% packet loss… that’s already worth a critical in my opinion. i’d use 3 packets with warning at 40% (2 packets lost) and set retry_interval to 1 with 3 retries. this would cause a notification to be sent out if 3 checks fail… that means in 2 minutes 6 packets out of 9 got lost…
define command{
command_name check_ping
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 2
}
Yes, -p 15 was excessive.
To see why I changed to -p 2, read the tips here: nagios.sourceforge.net/docs/1_0/tuning.html
Item #7 is the one in particular.
Thanks guys it seems to be behaving itself, im using 3 packets because almost all the checks are going via a vpn across the net to check all these hosts. There can be some internet flappage, so ive used slightly higher tollerances so nagios doesn’t cry wolf to often.