Hi, I’m having trouble with check_nrpe plug-in and was wondering if anyone could help me.
I have a script on the client called check_HBA_stat that either echos “ALL HBAs are online: $OUTPUT” right before exit 0 or “Please check using fcinfo (at least 1 HBA is offline): $OUTPUT”.
The OUTPUT contains a string that contains each HBA’s status, either online or offline, all in one line.
When I execute the script within the client on the command line, everything outputs as expected. However, when I run it using check_nrpe on Nagios server, it does not print out the $OUTPUT value. It only prints out the message upto the variable defined in the script. Can someone explain to me why this is and maybe guide me how I can fix it?
well, the usual error with nrpe is because of environnement variables and working directory.
=> when using check_nrpe, you’re using in fact a very light environnement, with almost no variables.
Whereas when you’re launching your script in command line, you have quite a lot of variables set, and also a current working directory.
=> try to use only absolute paths (no …/my_script)
=> define all variables needed at the beginning of your script (MY_VAR=/usr/bin; export MY_VAR …etc).
I hope this will help you
if not, don’t hesitate to post again
note: try to do some outputs at different points in your script, to see where it comes from
Are you running check_nrpe from the commandline, or through the service check in nagios? If through nagios, does the info return as expected through the command line? I’m thinking that maybe nagios doesn’t like something in the returned string (maybe something configured in illegal_macro_output_chars). What does the normal $OUTPUT result look like?
if none is offline, then output message and exit OK
if -z $CHECK_4_OFFLINE ]] ; then
INDEX1=1
while $INDEX1 -le $HBA_COUNT ]] ; do
HBA_STAT$INDEX1]=sudo fcinfo hba-port | grep -i State | awk 'NR=='$INDEX1 | awk '{print $NF}'
HBA_WWN$INDEX1]=sudo fcinfo hba-port | grep -i 'Port WWN' | awk 'NR=='$INDEX1 | awk '{print $NF}'
OUTPUT=$OUTPUT"HBA @ “${HBA_WWN$INDEX1]}” is “${HBA_STAT$INDEX1]}” ; "
(( INDEX1=$INDEX1+1 ))
done
echo All HBAs are online: $OUTPUT
exit 0
if any HBA state is “offline”, prompt user to check using fcinfo
else
echo Please check using fcinfo (at least 1 HBA is offline): $OUTPUT
exit 2
fi[/code]
Strides,
if the script is run on the client, it outputs something like below as expected. $ pwd
/usr/local/nagios/libexec
$ ./check_HBA_stat
All HBAs are online: HBA @ 2100000xxxxxxxxx is online ; HBA @ 10000000xxxxxxxx is online ; HBA @ 10000000xxxxxxxx is online ;
But if the script is run using check_nrpe on the nagios server, $OUTPUT doesn’t show (check_HBA is the command I defined in nrpe.cfg - command[check_HBA]=/usr/local/nagios/libexec/check_HBA_stat). $ pwd
/usr/local/nagios/libexec
$ ./check_nrpe -H xxx.xxx.xxx.xxx -c check_HBA
All HBAs are online:
Loose,
I think I defined all variables in the beginning. I am quite new to even working in unix environment so please understand. I do not quite understand when you say use only absolute paths. Do you mean when executing it? I have the full path within the command definition in nrpe.cfg. And I’m resorted to using a variable because servers we have different number of HBAs. Would there be a different way to tackle this problem?
I think Loose is refering to using the full path to all the commands you are using in the script itself. For example, instead of
HBA_COUNT=sudo fcinfo hba-port | grep -i State | awk 'END{print NR}'
do
HBA_COUNT=**/bin/**sudo **/path/to/**fcinfo hba-port | **/bin/**grep -i State | **/bin/**awk 'END{print NR}'
(with appropriate paths set as per your particular server). As it’s not even running correct from the nagios server CLI then this is most likely the issue - Loose is usually pretty right about this sort of thing
Yes, it works after using the full paths to all commands . Now that I am able to output what I want, I even made the script more informative. Thank you sooo much Strides and Loose!!