Well… I still haven’t found an answer… except I am getting a diffrent error now… and made no changes…
CHECK_NRPE: Socket timeout after 10 seconds
I can still launch check_nrpe by hand and have no problems… I have added more hosts, including sun servers. out of 9 sun servers I have 2 that I am getting time outs on.
It’s a little difficult to see exactly what the issue is, but if you’re able to run it by hand but not on in Nagios, I’d say it’s a config issue. Can I see the NRPE configs for your non-working hosts?
28/29 work, so it has to be a config problem on either end. My guess is it’s your
/etc/xinetd.d/nrpe file is missing an ip that is allowed to connect. And you must restart xinted service after the change.
I don’t think it is a config problem… I run the check_nrpe from the nagios server and I get a result back, this is the same identical check nagios is running for the other servers. the problem is when nagios runs the check for theses now 3 servers (added 9 more) I get a socket time out. I can still run the check manually from the nagios server and get a good result
By the fact that if you run it by hand, it works, means it has to be a config problem. The binary is not broken, and if you are running this command as user nagios, then the permissions on the file are correct.
It seems to me, that you have a syntax error in your config files.
For example:
your hostname is test.com
but your config file has it defined as tst.com
Please double check everything and copy/paste some of your stuff here, so we can look at it.
I have tried both running the check as root, and nagios user… both work fine when I run it manually…
here are some of the confs in question…
define host {
use Sun
parents fas940
host_name prddb880-1.budco.com
alias erp/bls DB Server
address prddb880-1.budco.com
check_command check-host-alive
hostgroups Sun
contact_groups LAdmins
}