I have about 30 services checking every 10 min. However a great number of them return as unknown. The Status information returns as /bin/ping -n -U -w 10 -c 5 (ip address). I can run the same check with the same command from the command-line and it returns ok. A ping returns the same result. I have made it so only one check can run at one time, so I don’t think the server is flooded with requests.
only a great number fail? So you are saying some of your ping checks do work and some don’t? If that is the case, are all of them defined identical? If so, I have no clue. Perhaps include your check_commands definitions from the config file, and anything else you think might help, like your service check “check_command” line for the service’s that do fail.
Here is an example of one that does it often. Below is the host, after that the service, and finally the command. Service bombs out returning unknown condition in the status line it says /bin/ping -n -U -w 10 -c 5 (ip address). It does this at sparatic times. It is inconsistant sometimes returning ok, sometimes warnings (if the return rate is low enough) but often it is unknown.
It WILL work for some of them all the time (some I have the IP set to 127.0.0.1 for the time being). All of them at different times. It’s like it’s changing scripts or something. (I looked at the other post and it didn’t help.)
define host{
host_name ChaffeeB4
alias Chaffee B-Wing Short Bottom Switch
parents ChaffeeB3
address {insert ip here of the switch}
check_command check-host-alive
check_interval 0
max_check_attempts 5
contact_groups SAMIS
notification_interval 0
notification_period days
notification_options u,d,r
}
It works when I run it from the command line every time, however when nagios runs it it returns this in the status of the service:
Current Status: UNKNOWN
Status Information: /bin/ping -n -U -w 10 -c 5 132.178.192.25
Performance Data:
Current Attempt: 5/5
State Type: HARD
Last Check Type: ACTIVE
Last Check Time: 28-02-2005 13:39:24
Status Data Age: 0d 0h 5m 43s
Next Scheduled Active Check: 28-02-2005 13:49:24
Latency: 20.365 seconds
Check Duration: 0.033 seconds
Last State Change: 28-02-2005 05:54:25
Current State Duration: 0d 7h 50m 42s
Last Service Notification: N/A
Current Notification Number: 0
Is This Service Flapping? NO
Percent State Change: 0.00%
In Scheduled Downtime? NO
Last Update: 28-02-2005 13:44:59
Status Information: PING CRITICAL - Host Unreachable
The above is what I get when a device is not able to be ping’d.
I’m concerned about your Status information line saying /bin/ping and not “PING”.
Please go through your config files to see if you have the check_command defined twice. Did you run nagios using the -v switch to check out the config files?
I don’t have it defined twice anywhere I can see. I am using the basic misccommands.cfg and checkcommands.cfg that come with nagios for all of my commands. I’m only using two more files the default resource.cfg and my own object file with the hosts/services/contacts in them.
I have tried the -v switch and it never throughs any problems my way.
nagios -v isn’t going to complain about a command definition of
bin/ping when it is supposed to be $user1$/check_ping.
Honest, you have a command defined somewhere in your .cfg files, that says /bin/ping. Look again please.
Got the same problem before. Tried to reconfigure the plugin (ver 1.4). It complain about the ping SYNTAX the first time I configure it, but I got a smooth configure during the second time. And now it runs ok.
Finally got it figured out… It looks like a bad nic or hub. I am currently working in a different place that has a nagios setup that has been around for years. We switched over some switches to a new subnet and those old deffentions are returning the same thing as I was getting. It looks like nagios is having a problem parcing out the error message so it just returns the command…