Ping Check Failed - Server is UP and Reachable

TheMel · March 28, 2011, 8:28pm

I have inherited a Nagios infrastructure as the new guy around and have no experience with the platform, whatsoever. My dilemma is:

I have few IBM AIX servers that Nagios monitors. All of them are using EtherChannel interfaces. Both legs of the EtherChannel are alive, providing a fat-pipe for network operations.

Every once in a while, I was told, not seen it with my own eyes, pinging the ip address of any of these AIX servers, yields more return packets than packets sent. I was again told that this is because of the Cisco switches and router’s action abnormally at times but we have no control over network operations group. So I can not even suggest them to fix their part. All I can do, is, either modify the Nagios Server to ignore higher than 100% return packets count or modify the client which is being pinged, not to send multiple responses to the same packet. Problem is, I don’t know how to accomplish either task and need a crash course in Nagios.

My initial questions are:

who is in charge of (initiates) the ping connectivity check task ? Nagios central server or the Client ?
how does it decide it has succeeded or failed ? Any way to change the success or failure criteria ?

Thank you in advance

Mel

luca · March 29, 2011, 8:51pm

i think i’d go and crash the switch with a hammer, at least it would be fun. (and having some real data at hand i’d go to the webpeople and tell them FIX IT, or have your boss go there and tell it to their boss)

Anyway, the nagios server initiates the check. it simply runs the check_ping plugin (usualy in /usr/local/nagios/libexec) with a couple of parameters. (IP, critical and warning thresholds.)

Are sure you are having a problem on nagios? usually you should ignore at least a single false positive, making a modification of the plugin quite useless.