Nagios argument with HP ProCurve switch memory


#1

I’m configuring Nagios to monitor my HP ProCurve switches. I found excellent command and service definitions at www.nagiosexchange.org and all is working wonderfully except for the service that monitors free memory.

The definitions that I’m using are copied and pasted directly from the above-mentions site. They are:

command in commands.cfg:

define command{ command_name check_hpmemoryfree command_line $USER1$/check_snmp -H $HOSTADDRESS$ -C $ARG1$ -o .1.3.6.1.4.1.11.2.14.11.5.1.1.2.1.1.1.6.1 -t 5 -w $ARG2$ -c $ARG3$ -u bytes -l free }

service in switch.cfg:

[code]# Service definition MEM-FREE
define service{
use generic-service ; Name of service template to use

host_name			Switch_MDF-1
service_description		MEM-FREE
is_volatile			0
check_period			24x7
max_check_attempts		3
normal_check_interval		5
retry_check_interval		1
notification_interval		240
notification_period		24x7
notification_options		c,r
check_command			check_hpmemoryfree!nagios!2000:30000000!1000:30000000

[/code]

The switches list about 150MB of total memory, with about 109MB free when I view status from the switch console itself. Nagios is correctly reporting the free 109MB, but is showing the state as critical.

I’ve done a good bit of googling to try to understand how the “2000:30000000” and “1000:30000000” sections work. I realize that those are ARG2 and ARG3, and that ARG2 is the warning level and ARG3 is the critical level. What I don’t understand is how to adjust those numbers to get the levels that I want to give warning and critical status on my particular switches. I’ve found info that states that two numbers followed by a colon are a range, and other info that says they are a less-than:higher-than definition for when to return the state defined by the command.

What I’d like is to have the following:

-Up to 60MB of free memory = OK
-Between 60MB and 40MB of free memory = Warning
-Less than 40MB of free memory = Critical

I will likely adjust those values once I get a better idea of memory usage under different loads.

I’d like to understand how to adjust the numbers in the service definition so that my service monitors will work as listed above. Can someone explain this, or point me to a resource that helps explain what the colon separated numbers mean on this particular command? I haven’t had any luck in my searching, but I’m continuing to try to find as much information as I can to understand this.


#2

Hi,

from the command “./check_snmp --help”:
-w, --warning=INTEGER_RANGE(s)
Range(s) which will not result in a WARNING status
-c, --critical=INTEGER_RANGE(s)
Range(s) which will not result in a CRITICAL status

Well; here you go :slight_smile:
Yes, that’s a very very weird way to do thresholds, and it’s really confusing!
My advice: you have to think the “normal” way and input the opposite.
ie: you want a warning between 60 and 40Mb ? => that means you want a warning as soon as the value is below 60
just input “-w 60:” (or 60000:, or with more “0”, I don’t know :))

you want a critical between 0 and 40 => “-c 40:”

even now, I’m not sure of the values I said above … I had to correct them twice … that’s really a weird way to do that.
Anyway, try to understand how it works and you’ll do fine :slight_smile:

(also, you can get rid of this plugin and script a new one :))


#3

Thanks Loose, I think the way you suggested defining the service makes a little more sense than the way I was trying to do it. I appreciate the help.


#4

Hello,

I have a few HP ProCurve 1800-24G, model “J9028B.”

I’m getting this as my return message “Return code of 127 is out of bounds - plugin may be missing”

Any ideas on what I can do or what I can use to monitor these switches?

thank you,
-Rudy


#5

what command are you running? if you tell us maybe somebody can help.-…


#6

Luca,

I’m using the following just to test:

define command{
command_name check_hpmemoryfree
command_line $USER1$/check_snmp -H $HOSTADDRESS$ -C $ARG1$ -o .1.3.6.1.4.1.11.2.14.11.5.1.1.2.1.1.1.6.1 -t 5 -w $ARG2$ -c $ARG3$ -u bytes -l free
}

I put that line in my /usr/local/nagios/etc/objects/commands.cfg

exchange.nagios.org/directory/Pl … ve/details

Let me know if I’m doing it wrong.

thank you,
Rudy


#7

do you have check_snmp in your libexec directory?


#8

bingo!
its not listed in the libexec folder… what should I do next?

-Noob


#9

recoimpile the plugins… when you run the configure command it tells you something like WARNING: net-snmp missing…
you’ll need to install the snmp package (make snmpd too at least you can test snmp on your localhost).
How to install these depends on the ditribution you are running…


#10

Thanks Luca!

I followed these directions unixmen.com/linux-tutorials/ … 9x-and-10x

Which is Ubuntu Server 10.04 and I’m on Nagios 3.2.2 using 1.4.15 plug-ins.

Do I just redo the install plug-ins step?

-Rudy


#11
 sudo ./configure --with-nagios-user=nagios --with-nagios-group=nagios

when you run this step you have to check the output. it is telling you what plugins will NOT be compild thorugh warnings.
until you get the snmp warning fixed it will not compile.
you’ll probably need to install the snmp snmpd snmplib packages. in debian you use aptitude. not sure what ubuntu uses to install RPMs.
can’t help you there.


#12

Thank you Luca,

Ubuntu can use apt-get or aptitude. I’ve been using aptitude since it will gather any and all necessary parts of the puzzle to complete.

Update:

So I re-ran the step however I was not able to see what things I needed. Is there a way to display the log?

Lastly, do I just use the following command “aptitude snmp snmpd snmplib packages” ?

Thank you,

-Rudy


#13

For anyone who’s still having issues with this run this line

  1. “aptitude install nagios-snmp-plugins”
  2. cd ~/downloads
    3.cd nagios-plugins-1.4.11
  3. Compile and install the plugins.
  4. “./configure --with-nagios-user=nagios --with-nagios-group=nagios”
  5. “make”
  6. “make install”

That should fix anyone Missing check_snmp from their Libexec folder.

-Rudy