When I first startup nagios 2.0, everything is fine. All checks come out okay and everything is good. But, when I reload nagios for a configuration change, or after a couple days of running, my root host (a core switch), reports the error out of bounds 127 error. Since this is my root host, all other hosts are marked as unreachable since this one reports an error. The only way I can fix this after that error is reported is to stop nagios, delete retention.dat and status.sav and restart nagios. Then its fine again until I have to make a config change or it runs for a few days. These are the latest plugins, they have the correct permssions and ownership on them and they are located in the right directiory, with the correct perms and ownership.
Any ideas on why this fails after a period of time or on a config reload?
It’s a known Nagios fault (goes as far back as V1) that you should NOT use the “Restart” option. It’s in the Nagios FAQ I think, always shut down the Nagios process and start it manually.
The problem occurs because the restart command may not always end the previous process properly and they then conflict.
I searched around for it and all I could find is stuff telling you to make sure the plugin is in the right location (it is), the rights are correct (they are), and that it actually exists (it does). I dont use the restart command, but rather the rcnagios reload to reload the config after I make a change. This only started happening to me since I upgraded to v2.
Look at the checkcommands.cfg file to see where the plug in is supposed to be and check if the file is really there with execution perms to user nagios.
I am sure this is all down to restarting Nagios. I think on the Nagios site (FAQ) they suggest actually using “Shutdown the Nagios Process” and then manually start it again from the command line. Haven’t had this error since I’ve started doing that.