NRPE Strange problem

madbuda · April 11, 2006, 12:52pm

I have 29 servers, that are identical in packages os level everything. (RH ES4)

I have nagios monitoring them using NRPE. 28 of them work.

I get connection refused on 1 server (tried xinetd and stand alone)

I run check_nrpe manually I get good results.

[root libexec]# ./check_nrpe -H bpid-02 -c check_disk1
DISK OK - free space: / 500 MB (34%);| /=974MB;1454;1464;0;1474

[1144728000] CURRENT SERVICE STATE: bpid-02.budco.com;Check /;CRITICAL;HARD;4;Connection refused by host

Any help would be appreciated.

madbuda · April 13, 2006, 2:23pm

Well… I still haven’t found an answer… except I am getting a diffrent error now… and made no changes…

CHECK_NRPE: Socket timeout after 10 seconds

I can still launch check_nrpe by hand and have no problems… I have added more hosts, including sun servers. out of 9 sun servers I have 2 that I am getting time outs on.

help plz :shock:

system · April 13, 2006, 8:13pm

It’s a little difficult to see exactly what the issue is, but if you’re able to run it by hand but not on in Nagios, I’d say it’s a config issue. Can I see the NRPE configs for your non-working hosts?

jakkedup · April 13, 2006, 9:14pm

28/29 work, so it has to be a config problem on either end. My guess is it’s your
/etc/xinetd.d/nrpe file is missing an ip that is allowed to connect. And you must restart xinted service after the change.

madbuda · April 14, 2006, 10:03am

I don’t think it is a config problem… I run the check_nrpe from the nagios server and I get a result back, this is the same identical check nagios is running for the other servers. the problem is when nagios runs the check for theses now 3 servers (added 9 more) I get a socket time out. I can still run the check manually from the nagios server and get a good result

luca · April 14, 2006, 10:41am

i suppose you are running the check by hand as user nagios…

Luca

jakkedup · April 14, 2006, 2:17pm

By the fact that if you run it by hand, it works, means it has to be a config problem. The binary is not broken, and if you are running this command as user nagios, then the permissions on the file are correct.
It seems to me, that you have a syntax error in your config files.

For example:
your hostname is test.com
but your config file has it defined as tst.com
Please double check everything and copy/paste some of your stuff here, so we can look at it.

madbuda · April 14, 2006, 4:38pm

I have tried both running the check as root, and nagios user… both work fine when I run it manually…

here are some of the confs in question…

define host {
use Sun
parents fas940
host_name prddb880-1.budco.com
alias erp/bls DB Server
address prddb880-1.budco.com
check_command check-host-alive
hostgroups Sun
contact_groups LAdmins
}

define command{
command_name check_sun_disk1
command_line $USER1$/check_nrpe -n -H $HOSTADDRESS$ -t 30 -c check_disk1
}

define service {
use check_sun_disk1
service_description /
check_command check_sun_disk1
host_name prddb880-1.budco.com
servicegroups Sun
contact_groups LAdmins
}

madbuda · April 14, 2006, 4:39pm

also all of my servers are running nrpe as a standalone daemon

jakkedup · April 14, 2006, 6:48pm

address prddb880-1.budco.com

You should never use names in nagios, unless you have no other choice. Add the name to /etc/hosts file or else use the IP.