Perl check script, telnet output, redirection problem

I have a perplexing problem with a check script I wrote; hopefully someone can lend some insight…

On our remote systems we have a hacked up version of mbmon running. mbmon access the lm80 (and other) hardware sensor chips and reports hardware status. It runs in daemon mode, which just mins it sits there listening on a socket and, when someone telnets to it, it reports a line of info and closes the socket. The way in which this is a “hacked up” version is that it sends back all the info in a CSV list, as opposed to the standard format.

The perl check script (check_mbmon.pl) works fine from the command line. You provide it a host address, the sensor you wish to check and the warning and crit levels and it sends back its check results. It achieves this by performing the telnet command via backticks and capturing the output in an array (@stuff = telnet $host $port 2>>/dev/null). It then finds the correct line in the array and splits the CSV list into separate sensor values. Works first time, every time…from the command line.

However, when used from Nagios in a check command, it never gets the line of output containing the CSV list from the telnet command…actually almost never. It receives the connect and disconnect messages from the telnet session, just not the line that has the sensor data. I say almost never because I let it run overnight and checked my archive for the service, it actually recovered 6 times and reported correct sensor data. But then it goes for hours checking every couple of minutes and never gets anything. Yet, I have run the check script from the command line at least 500 times, and have always received proper sensor data.

So, to summarize, on remote system I have a daemon listening to a socket, when it gets an open, it writes a line to the socket and does a close. On the Nagios host I have a checkscript that telnets to the remote host/port and receives the data, parses it and generates notifications. Works from command line, doesn’t work from within the Nagios framework (usually doesn’t work, on occasion does). Seems to be something with output redirection would be my guess, but darn if I can figure it out.

Check script attached (and feel free to use for your own purposes if you have need of it), any assistance appreciated.

Bah, I tried to attach the script, looks like it didn’t take for whatever reason. here it is…


#!/usr/bin/perl -w

Tell Perl what we need to use

use strict;
use Getopt::Std;

use vars qw($opt_f $opt_p $opt_t $opt_v
$opt_c $opt_w $opt_W $opt_C
$host $port $opt_H $sensor
$temp1 $temp2 $temp3
$fan1 $fan2 $fan3
$proc1 $proc2
$volt33 $volt5 $volt12 $volt12neg $volt5neg
$crit_level_low $warn_level_low
$crit_level_high $warn_level_high
%exit_codes $msg @stuff $line);

Predefined exit codes for Nagios

%exit_codes = ( ‘UNKNOWN’, -1,
‘OK’, 0,
‘WARNING’, 1,
‘CRITICAL’, 2,);

Process options

if ($#ARGV le 0)
{
&usage;
}
else
{
getopts(‘c:C:f:H:P:t:v:w:W:’);
}

Error check options

if (!$opt_w or $opt_w == 0 or !$opt_c or $opt_c == 0 or !$opt_W or $opt_W == 0 or !$opt_C or $opt_C == 0)
{
print “*** You must define WARN and CRITICAL levels!”;
&usage;
}
elsif (!$opt_f and !$opt_p and !$opt_t and !$opt_v)
{
print “*** You must select a sensor (f, p, t or u) to monitor!”;
&usage;
}

if (($opt_f and ($opt_p or $opt_t or $opt_v)) or
($opt_p and ($opt_t or $opt_v)) or
($opt_t and $opt_v))
{
print “*** You must select only one sensor (f, p, t or u) to monitor!”;
&usage;
}

if ($opt_w > $opt_W)
{
print “*** WARN low end (w) must be less than WARN high end (W)!”;
&usage;
}
elsif ($opt_c > $opt_C)
{
print “*** CRITICAL low end © must be less than CRITICAL high end ©!”;
&usage;
}
elsif ($opt_w <= $opt_c or $opt_W >= $opt_C)
{
print “*** WARN range must be contained within CRITICAL range!”;
&usage;
}

if (!$opt_H)
{
$host = “localhost”;
}
else
{
$host = $opt_H;
}

$port = 4747;

telnet to mbmon port to get sensor readings (4th line of output)

@stuff = telnet $host $port 2>/dev/null;

foreach $line ( @stuff)
{
chomp $line;

print $line;

if ( grep /,/, $line )
{ 
	($temp1, $temp2, $temp3, $fan1, $fan2, $fan3, $proc1, $proc2, $volt33, $volt5, $volt12, $volt12neg, $volt5neg) = split(/,/, $line);
}

}

if ($opt_f)
{
if ($opt_f == 1)
{
$msg = “Fan 1 rotation speed is $fan1\n”;
$sensor = $fan1;
}
elsif ($opt_f == 2)
{
$msg = “Fan 2 rotation speed is $fan2\n”;
$sensor = $fan2;
}
elsif ($opt_f == 3)
{
$msg = “Fan 3 rotation speed is $fan3\n”;
$sensor = $fan3;
}
else
{
print “*** Select Fan 1, 2 or 3!”;
&usage;
}
}
elsif ($opt_p)
{
if ($opt_p == 1)
{
$msg = “CPU 1 core voltage is $proc1\n”;
$sensor = $proc1;
}
elsif ($opt_p == 2)
{
$msg = “CPU 2 core voltage is $proc2\n”;
$sensor = $proc2;
}
else
{
print “*** Select Processor core 1 or 2!”;
&usage;
}
}
elsif ($opt_t)
{
if ($opt_t == 1)
{
$msg = “Temperature sensor 1 is $temp1\n”;
$sensor = $temp1;
}
elsif ($opt_t == 2)
{
$msg = “Temperature sensor 2 is $temp2\n”;
$sensor = $temp2;
}
elsif ($opt_t == 3)
{
$msg = “Temperature sensor 3 is $temp3\n”;
$sensor = $temp3;
}
else
{
print “*** Select Temperature 1, 2 or 3!”;
&usage;
}
}
elsif ($opt_v)
{
if ($opt_v == 1)
{
$msg = “3.3 volt sensor is $volt33\n”;
$sensor = $volt33;
}
elsif ($opt_v == 2)
{
$msg = “5 volt sensor is $volt5\n”;
$sensor = $volt5;
}
elsif ($opt_v == 3)
{
$msg = “12 volt sensor is $volt12\n”;
$sensor = $volt12;
}
elsif ($opt_v == 4)
{
$msg = “Negative 12 volt sensor is $volt12neg\n”;
$sensor = $volt12neg;
}
elsif ($opt_v == 5)
{
$msg = “Negative 5 volt sensor is $volt5neg\n”;
$sensor = $volt5neg;
}
else
{
print “*** Select Voltage 1 (3.3v), 2 (5v), 3 (12v), 4 (-12v) or 5 (-5v)!”;
&usage;
}
}
else
{
print “*** You must select a valid sensor (f, p, t or u) to monitor!”;
&usage;
}

$warn_level_low = $opt_w;
$crit_level_low = $opt_c;
$warn_level_high = $opt_W;
$crit_level_high = $opt_C;

if ( !$sensor )
{
print “Sensor UNKNOWN - No data from mbmon”;
exit $exit_codes{‘UNKNOWN’};
}
elsif ($sensor <= $crit_level_low)
{
print “Sensor CRITICAL - $msg”;
exit $exit_codes{‘CRITICAL’};
}
elsif ($sensor >= $crit_level_high)
{
print “Sensor CRITICAL - $msg”;
exit $exit_codes{‘CRITICAL’};
}
elsif ($sensor <= $warn_level_low)
{
print “Sensor WARNING - $msg”;
exit $exit_codes{‘WARNING’};
}
elsif ($sensor >= $warn_level_high)
{
print “Sensor WARNING - $msg”;
exit $exit_codes{‘WARNING’};
}
else
{
print “Sensor OK - $msg”;
exit $exit_codes{‘OK’};
}

Show usage

sub usage()
{
print “\ncheck_mbmon.pl v0.3 - Nagios Plugin\n\n”;
print “usage:\n”;
print " check_mbmon.pl -H <Host Name/IP> -<tN|fN|pN|vN> -w N -c N -W N -C N\n\n";
print “options:\n”;
print " -t check Temperature Sensor N\n";
print " N Meaning\n";
print " 1 Temperature Sensor 1\n";
print " 1 Temperature Sensor 2\n";
print " 1 Temperature Sensor 3\n";
print " -f check Fan N Rotation Speed\n";
print " N Meaning\n";
print " 1 Fan 1 Sensor\n";
print " 1 Fan 2 Sensor\n";
print " 1 Fan 3 Sensor\n";
print " -p check Processor N Core Voltage\n";
print " N Meaning\n";
print " 1 Processor 1 Core Voltage Sensor\n";
print " 2 Processor 2 Core Voltage Sensor\n";
print " -t check Voltage Sensor N\n";
print " N Meaning\n";
print " 1 3.3V Sensor\n";
print " 2 5V Sensor\n";
print " 3 12V Sensor\n";
print " 4 -12V Sensor\n";
print " 5 -5V Sensor\n";
print " -w N WARN if sensor < N\n";
print " -c N CRITICAL if sensor < N\n";
print " -W N WARN if sensor > N\n";
print " -C N CRITICAL if sensor > N\n";

exit $exit_codes{‘UNKNOWN’};
}

Sometimes it works, others it doesn’t? Perhaps CPU overload? Maybe try to run only this check, and see what happens. You might want to try and increase the check_interval, so it checks every 10 or 20 minutes, instead of 5 or 6.

Actually, it turns out to be some weirdness between telnet and the extra layers of redirection its output is passing through. I tried a whole bunch of things, always got sporadic responses…but I always got the boilerplate telnet stuff (trying to connect, escape character is, etc, etc) and infrequently received the one data line that I was suppose to get. But if running the check script directly at the terminal, I always got it. Anyway, I just replaced the telnet command in the check script with a socket open of the port and, viola, works like a champ. So if anyone wants to use that check script for anything, just replace these lines:

telnet to mbmon port to get sensor readings (4th line of output)

@stuff = telnet $host $port 2>/dev/null;

foreach $line ( @stuff)
{
chomp $line;

print $line;

if ( grep /,/, $line )
{ 
	($temp1, $temp2, $temp3, $fan1, $fan2, $fan3, $proc1, $proc2, $volt33, $volt5, $volt12, $volt12neg, $volt5neg) = split(/,/, $line);
}

}

with these lines:

$remote = IO::Socket::INET->new(
Proto => “tcp”,
PeerAddr => “$host”,
PeerPort => “4747”,)
or die “cannot connect to mbmon port at $host”;
$line = <$remote>;

($temp1, $temp2, $temp3, $fan1, $fan2, $fan3, $proc1, $proc2, $volt33, $volt5, $volt12, $volt12neg, $volt5neg) = split(/,/, $line);

(and declare your $remote variable up top).
Edited Mon Apr 18 2005, 01:42AM ]

Maybe you would like to make a seperate post. Something like, "hey, here is a new perl script to check… ", You know, so that you will get more people to actually see your thread, and put your hard work to more use.