Nagios misreporting


#1

Our company has been put into a situation were both employees who have installed and configured Nagios have left the organization in a very short timeframe of each other. Myself and another employee are new to unix but can navigate around a bit and do some things.

Nagios has been working well for us up until this point so I’m trying to hold off on exploring new monitoring software even though our director has implored me to do this. We see alerts coming from one particular unix server and one particular windows server on a daily basis that are reporting incorrectly. It starts with one alert for a service being down. After this first alert, all the other instances where we have Nagios monitoring come back and alert us to being in a state of critical or down. We check the server and everything is running fine…no problems. It seems to happen around the same time every day. May re-installing the agent help? If so, how do I do this? May this be another problem? Any help would be appreciated!


#2

Sounds like you haven’t read about nagios at all. Normally, there are no agents. If there are, they could be any number of methods, such as ntclients, external nagios installations using nsca, etc.
From your description, it sounds like you might simply be having a problem with pinging the device at one point, and after that, the other services get all backed up, due to some excessive timeouts, or something.

Since you are new to nagios, you have no choice but to start from the beginning. i.e. the docs.
nagios.sourceforge.net/docs/1_0/