Monitoring remote process exit status from Nagios


#1

I have Nagios under Fedora Linux and a Java process under RedHat.
I need to check from time to time if a specific Java process ended up successfully and collected about itself statistics. I have a Perl script for that case.
When I remotely run my Perl script from Fedore using ‘rsh’, I can access the process and get the results, but I am asked to type password.
rsh mesmes -l user123 ‘perl /staging/monitorW.pl’
If I try to run the same under Nagios automatically, I am getting "permission denied’.

I have a number of similar processes to monitor on an AIX Unix host.
I set up config files the same way between Fedora & AIX Unix machines, I do not have this problem there. So what it could be a problem between RedHat & Fedora?

1).On both of the hosts there is the same user ‘user123’ defined
2) The file /etc/hosts:

cat /etc/hosts
127.0.0.1 mesmes
12.12.2.181 nagios
cat /etc/hosts
127.0.0.1 nagios
12.12.2.182 mesmes

  1. The .rhosts file resides in a user’s home directory and specifies the remote machines and remote user names that the user may use to remotely log in to the local machine.

cat .rhosts
mesmes user_name
nagios user_name

  1. After these changes done, I restarted the ‘rsh’ service

Thanks, :shock:


#2

sounds like you need to use nrpe for your nagios server to execute your plugin remotely

you can find nrpe at: http://www.nagios.org/download/

keep in mind that the version of check_nrpe that runs on the nagios server has to be the same as the verison of nrpe the runs on the remote host you are monitoring or it won’t work


#3

Thanks for the advice.
I installed nrpe and configured it, but I still cannot run not only my PERL script, but even other Nagious plug-ins like check_local_load or any others from the command line in Linux. Both of the hosts are configured under nagios, and nagios shows on the Web interface all the statistics about the host, where I try to run through nrpe my script.
I am getting
CHECK_NRPE: Error receiving data from daemon.
when I try to run my command:
./check_nrpe -n -H 10.10.21.159 -p 5666 -c “/usr/bin/perl /data/ntsr/SPMF/monitorSpMF.pl”

The latest version of Nagios installed under Fedora. The process I am trying to monitor is under RedHat


#4

Getting a “Error receiving data from daemon.” may mean that the allowed_hosts property in the nrpe.cfg of the remote host you are monitoring does not include the IP address of your nagios server.

It also may mean that the version of check_nrpe on your nagios server is not the same version as nrpe on the remote host you are monitoring.

One other thing to note, the -c qualifier for check_nrpe (defined on the nagios server) has to match the command pneumonic which is specified in nrpe.cfg on the remote host you are monitoring.

i.e. if you are trying check_nrpe from the nagios server manually, -c needs to match the string specified in the command…] declaration in nrpe.cfg on the remote host you are monitoring.

I mention this because you have specified it as a statement executed on the command line, and that should be in the nrpe.cfg instead.

If this doesn’t help, maybe if you post the relevant service and check command definitions on your nagios server and the command definition on the remote host, that may help clear things up.


#5

Did you turn off the SElinux firewall and try it?


#6

I checked the configuration on both of the machines:
Configuring on the Remote Host with the ‘nrpe’ daemon

  1. Getting “Error receiving data from daemon.” may mean that the allowed_hosts property in the nrpe.cfg of the remote host you are monitoring does not include the IP address of your nagios server.
    The nagios server included into the mesmes nrpe.cfg:
    allowed_hosts=127.0.0.1, 12.12.2.181

  2. It also may mean that the version of check_nrpe on your nagios server is not the same version as nrpe on the remote host you are monitoring.

The same version of nrpe installed on both machines:
on mesmes nrpe : Version: 2.5.1
on nagios nrpe: Version: 2.5.1

NRPE - Nagios Remote Plugin Executor
Copyright © 1999-2006 Ethan Galstad (nagios@nagios.org)
Version: 2.5.1
Last Modified: 04-09-2006
Usage: nrpe -n] -c <config_file>

  1. One other thing to note, the -c qualifier for check_nrpe (defined on the nagios server) has to match the command pneumonic which is specified in nrpe.cfg on the remote host you are monitoring.
    i.e. if you are trying check_nrpe from the nagios server manually, -c needs to match the string specified in the command…] declaration in nrpe.cfg on the remote host you are monitoring.
    I mention this because you have specified it as a statement executed on the command line, and that should be in the nrpe.cfg instead.

In the nrpe.cfg on the nrpe daemon host the command is defined the following way:

command[check_nrpe]=/usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

I am using xinetd:

[root@mesmes etc]# vi /etc/xinetd.d/nrpe
service nrpe
{
flags = REUSE
disable = no
port = 5666
socket_type = stream
wait = no
user = root
server = /usr/local/nagios/bin/nrpe
server_args = -n -d -c /user/local/nagios/etc/nrpe.cfg
log_on_failure += USERID
only_from = 12.12.2.181
}

File /etc/services:
nrpe 5666/tcp # Nagios – nrpe

[root@mesmes etc]# /etc/rc.d/init.d/xinetd restart
Stopping xinetd: OK ]
Starting xinetd: OK ]
[root@mesmes etc]# /etc/rc.d/init.d/xinetd status
xinetd (pid 13888) is running…

[root@mesmes etc]# netstat -pta
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name

tcp 0 0 *:nrpe : LISTEN 13888/xinetd

hosts.allow This file describes the names of the hosts which are

nrpe: LOCAL, 12.12.2.181

[root@mesmes etc]# vi /etc/hosts.deny

hosts.deny

nrpe: ALL

Configuring on the Nagios Host

In the “checkcommands.cfg”

‘check_nrpe’ command definition

define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

I do not even defined a nagios service, because I am still getting the same error, when I try to run it from the nagios machine from the command line:
CHECK_NRPE: Error receiving data from daemon.


#7

The SElinus firewall is turned off


#8

…looks like a change to nrpe.cfg on mesmes (the remote host you are monitoring that runs nrpe) is in order.

You stated the command definition is:

command[check_nrpe]=/usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

From how you described it in your posts, it should more likely be:

command[monitorSpMF]=/usr/bin/perl /data/ntsr/SPMF/monitorSpMF.pl

after making this change and restarting nrpe on mesmes, log in to your nagios server and try the following manually and post the results when you can:

./check_nrpe -n -H 10.10.21.159 -p 5666 -c monitorSpMF

…the above suggested command assumes the IP address of the remote host mesmes you are trying to monitor is 10.10.21.159.


#9

Tried it.

  • added the new command into the nrpe.cfg on the remote host mesmes:
    command[monitorSpMF]=/usr/bin/perl /data/ntsr/SPMF/monitorSpMF.pl
  • restarted xinetd on mesmes
  • run the command on the nagios machine
    ./check_nrpe -n -H 10.10.21.159 -p 5666 -c monitorSpMF, and I still get the same error

#10

…ok here are a couple of other ideas:

  1. you may need to put the IP address of your nagios server in /etc/hosts.allow on mesmes; it is possible that mesmes is blocking the requests from your nagios server

  2. log in as the user that runs nrpe on mesmes and try executing the command: /usr/bin/perl /data/ntsr/SPMF/monitorSpMF.pl
    this will determine if the user running nrpe has appropriate permissions to run this perl script. Post the output and the value of echo $? to see if the return code is ok


#11
  • I do not have problem to run locally as a root
    /usr/bin/perl /data/ntsr/SPMF/monitorSpMF.pl
  • when I check the status of the process:
    echo $?
    I am getting the right status
  • the nagios /etc/hosts.allow has the mesmes IP address:
    [root@nagios etc]# cat hosts.allow

hosts.allow This file describes the names of the hosts which are

allowed to use the local INET services, as decided

by the ‘/usr/sbin/tcpd’ server.

nrpe: 10.10.21.159 , LOCAL

I also can see from nagios Web interface statistics from this host as a result of running check_disk, check_load and etc


#12

Even if I try remotely to run one of the commands defined in the nrpe.cfg
like:

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_disk1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1
command[check_disk2]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hdb1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200

Using check_nrpe
./check_nrpe -n -H mesmes -c check_users
it gives the same error: CHECK_NRPE: Error receiving data from daemon.


#13

…I’m running out of ideas here, so maybe someone else can think of something.

  1. the /etc/hosts.allow on mesmes (not the nagios server) has to have the IP address of the nagios server or it won’t work; you gave the /etc/hosts.allow on the nagios server, not on mesmes

  2. the version of ./libexec/check_nrpe on the nagios server has to be the same version of nrpe on mesmes; can you double check this?

  3. are you running nrpe as root on mesmes? If not, then that might be part of the problem

one other thing that may be a longshot is that you are using the -n qualifier in check_nrpe which indicates to not use SSL; you might try it without this qualifier

…just as a tip, until you can sort this out, continue testing it by manually running ./libexec/check_nrpe from the command line on the nagios server, this way you eliminate the nagios configs as a potential cause of the problem


#14

…here’s something else you can double check:

  1. make sure the build qualifiers you gave when building nrpe match up with your configuration, e.g.:

./configure --with-nrpe-user=nagios --with-nrpe-group=nagiocmd --with-nrpe-port=5666

this requires the following:

a. that nrpe on mesmes run under user nagios with group nagiocmd and that it will use port 5666

b. when invoking ./libexec/check_nrpe the port # following the -p qualifier needs to be 5666

c. the server_port property in nrpe.cfg on mesmes also needs to be 5666

d. the nrpe_user and nrpe_group properties in nrpe.cfg on mesmes needs to be nagios and nagiocmd respectively

you may also want to check the command_timeout in nrpe.cfg on mesmes and make sure it is at least 60


#15

…one other thing, I noticed in your /etc/xinetd.d you specified server_args for the nrpe service as:

server_args = -n -d -c /user/local/nagios/etc/nrpe.cfg

…according to the nrpe 2.5.1 README it should be:

server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd

…maybe that has been the problem all along…


#16

reah did you ever get the nrpe issue worked out? I am havning the same problem and can’t get the npre daemon to return any data.