Can not get useful info from remote plugin

ricmon · June 26, 2006, 6:58pm

I got the daemon running but in the Status Information box all I get as returned output is “NRPE v2.5.1”. How can I get the desired output from the remote pluggin?

define service{
use generic-service ; Name of service template to use
host_name tvp-control-01
service_description remote partition size
is_volatile 0
check_period 24x7
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24x7
check_command check_nrpe!check_disk1
}

define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$
}

On remote server nrpe.cfg plugins:

command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_disk1]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /dev/hda1
command[check_disk2]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /dev/hdb1
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200

sharamun · June 26, 2006, 8:06pm

in your command definition, you don’t have the port or the command arguments, you can also include a timeout value (-t) as in the following:

-p 5666 -t 180 -c $ARG1$

ricmon · June 27, 2006, 2:35pm

Thanks sharamun
Now I have another issue. Even though both servers are configure the same I’m getting the error message

Server crawl-01 gives error “NRPE: Unable to read output” while server control-01 has no problem returning output

What exactly does this mean?

sharamun · June 27, 2006, 3:50pm

“NRPE: Unable to read output” likely means that there wasn’t any output provided by the plugin. If I recall correctly, I’ve seen this problem when plugins fail because of a missing path to a library that it needs.

Try tracking this down by executing the plugin manually , logged in as user nagios on crawl-01, with the same syntax as your command definition. Hopefully this will give you a clue.

ricmon · June 27, 2006, 5:27pm

I tried your recommendation before posting.

Servers crawl-01/02 and contol-01 share the same service definitions in the services.cfg file:

All config file where copied from a working install on control-01

All rpm’s are identical. What could be the problem?

define service{

    use                             generic-service         ; Name of service template to use

    host_name                       tvp-control-01,tvp-crawl-01,tvp-crawl-02

    service_description             Root Partition Size

    is_volatile                     0

    check_period                    24x7

    max_check_attempts              4

    normal_check_interval           5

    retry_check_interval            1

    contact_groups                  admins

    notification_options            w,u,c,r

    notification_interval           960

    notification_period             24x7

    check_command                   check_nrpe!check_disk1

    }

define service{

    use                             generic-service         ; Name of service template to use 

    host_name                       tvp-control-01,tvp-crawl-01,tvp-crawl-02 

    service_description             Total Number of Procs 

    is_volatile                     0 

    check_period                    24x7 

    max_check_attempts              4 

    normal_check_interval           5 

    retry_check_interval            1 

    contact_groups                  admins 

    notification_options            w,u,c,r 

    notification_interval           960 

    notification_period             24x7 

    check_command                   check_nrpe!check_total_procs

    }

define service{

    use                             generic-service         ; Name of service template to use

    host_name                       tvp-control-01,tvp-crawl-01,tvp-crawl-02

    service_description             System Load

    is_volatile                     0

    check_period                    24x7

    max_check_attempts              4

    normal_check_interval           5

    retry_check_interval            1

    contact_groups                  admins

    notification_options            w,u,c,r

    notification_interval           960

    notification_period             24x7

    check_command                   check_nrpe!check_load

    }

define command{

    command_name    check_nrpe

    command_line    $USER1$/check_nrpe  -H $HOSTADDRESS$ -p 5666 -t 180 -c $ARG1$

    }

NAGIOS SCREEN

tvp-control-01
JBOSS OK 06-27-2006 12:48:41 0d 3h 8m 52s 1/4 HTTP OK HTTP/1.1 200 OK - 1774 bytes in 0.001 seconds

PING OK 06-27-2006 12:51:11 0d 15h 56m 9s 1/4 PING OK - Packet loss = 0%, RTA = 0.47 ms

Root Partition Size OK 06-27-2006 12:48:46 0d 0h 31m 11s 1/4 DISK OK - free space: / 3607 MB (92%):

System Load UNKNOWN 06-27-2006 12:51:16 0d 1h 45m 31s 4/4 Warning threshold must be float or float triplet!

Total Number of ProcsOK 06-27-2006 12:48:52 0d 0h 31m 6s 1/4 PROCS OK: 67 processes

OK 06-27-2006 12:48:52 0d 0h 31m 6s 1/4 PROCS OK: 67 processes

tvp-crawl-01
JBOSS OK 06-27-2006 12:51:22 12d 20h 59m 36s 1/4 HTTP OK HTTP/1.1 200 OK - 1774 bytes in 0.002 seconds

PING OK 06-27-2006 12:48:57 12d 22h 24m 27s 1/4 PING OK - Packet loss = 0%, RTA = 0.18 ms

Root Partition Size WARNING 06-27-2006 12:51:27 0d 0h 28m 27s 4/4 NRPE: Unable to read output

System Load WARNING 06-27-2006 12:49:02 0d 0h 30m 55s 4/4 NRPE: Unable to read output

Total Number of Procs WARNING 06-27-2006 12:51:32 0d 0h 28m 22s 4/4 NRPE: Unable to read output

tvp-crawl-02

PING OK 06-27-2006 12:49:07 12d 22h 23m 25s 1/4 PING OK - Packet loss = 0%, RTA = 0.20 ms

Root Partition Size WARNING 06-27-2006 12:49:37 0d 0h 7m 2s 4/4 NRPE: Unable to read output

System Load WARNING 06-27-2006 12:52:12 0d 0h 9m 27s 4/4 NRPE: Unable to read output

Total Number of Procs WARNING 06-27-2006 12:49:42 0d 0h 6m 57s 4/4 NRPE: Unable to read output

sharamun · June 27, 2006, 5:39pm

what was the output of executing:

/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20

manually from the command line as user nagios on tvp-crawl-01?

If it works, try modifying the check_nrpe command in checkcommands.cfg to put all arguments after $ARG1$ in quotes as in the following example:

command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -t 180 -c $ARG1$ -a “$ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ $ARG7$”

The check_load plugin has a -c qualifier that conflicts with the -c qualifier in check_nrpe, so the quotes should eliminate it.

ricmon · June 27, 2006, 7:14pm

Adding arguments didn’t work. Any other ideas?

sharamun · June 27, 2006, 7:24pm

…still haven’t seen the output of manually running check_load on tvp-crawl-01. Only other suggestion I can offer without seeing this is to make sure that the nrpe.cfg is set properly on tvp-crawl-01, in particular the following parameters:

allowed_hosts (has to include the IP address of your nagios server)
dont_blame_nrpe=1
command_timeout=180 (optional)

…also, nrpe must have been built with the --enable-command-args configure qualifier in order to support passing arguments through it.

ricmon · June 27, 2006, 8:04pm

sorry thought I supplied it. also using same compiled nrpe on all servers.

/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
OK - load average: 0.01, 0.01, 0.00|load1=0.010;15.000;30.000;0; load5=0.010;10.000;25.000;0; load15=0.000;5.000;20.000;0;

sharamun · June 27, 2006, 8:18pm

ok, now that you know the plugin runs fine on tvp-crawl-01, login to your nagios server as user nagios, execute the following command and post the results:

/check_nrpe -H -p 5666 -t 180 -c check_load -a “-w 15,10,5 -c 30,25,20”

ricmon · June 27, 2006, 8:23pm

ran command per your instructions. got this back:

/usr/lib/nagios/plugins/check_nrpe -H 172.22.77.65 -p 5666 -t 180 -c check_load -a "-w 15,10,5 -c 30,25,20"
Warning threshold must be float or float triplet!

sharamun · June 27, 2006, 8:39pm

ok, try it without any values like in the following and post the results:

/usr/lib/nagios/plugins/check_nrpe -H 172.22.77.65 -p 5666 -t 180 -c check_load

BTW, I’m assuming after making any nagios or nrpe configuration changes on either the nagios server or on tvp-crawl-01 you’ve stopped and restarted either nagios or nrpe, if this is not the case, make sure you do that first as the configuration files only get read when they start up.

What I’m getting at is this:

If you are passing check_load the arguments from the nagios server, the configuration should be:
nagios server checkcommands.cfg definition for check_nrpe:
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -t 180 -c $ARG1$ -a "$ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ $ARG7$"
nagios server services.cfg definition:
check_command check_nrpe!check_load!-w!15,10,5!-c!30,25,20
nrpe client nrpe.cfg command definition (on tvp-crawl-01):
command[check_load]=/usr/lib/nagios/plugins/check_load $ARG1$ $ARG2$ $ARG3$ $ARG4$
If you are specifying all check_load arguments on the nrpe client, the configuration should be:
nagios server checkcommands.cfg definition for check_nrpe_noargs:
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -t 180 -c $ARG1$
nagios server services.cfg definition:
check_command check_nrpe_noargs!check_load
nrpe client nrpe.cfg command definition (on tvp-crawl-01):
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20

…choose one of those and it should work.

ricmon · June 28, 2006, 2:18pm

is this a variable name you made up or is it a nagios reserved word check_nrpe_noargs!

sharamun · June 28, 2006, 3:53pm

check_nrpe_noargs is a recommended command definition to use so that it doesn’t conflict with your existing check_nrpe command definition; you’ll need to create the command definition for this in checkcommands.cfg (check_nrpe supports passing arguments, check_nrpe_noargs doesn’t)

ricmon · June 29, 2006, 5:21pm

I would like to thank sharamun for all the help. I now have nrpe working. now comes the fun part. to get nagios to meet managments expectations.

jakkedup · July 8, 2006, 3:38pm

The same topic in several threads? Looks like one should have been enough, but I might be wrong.