Snmp monitoring


#1

I have come across this problem a number of times.
This is with regards to check_snmp
I am monitoring my APC UPS system using an OID which requires that result should be returned ok if greater than 85%. But nagios gives alerts when the threshold crosses a certain value. In my case if it falls below 85% then it should give an alert. Here is the service.cgi part,
define service{
use generic-nix ; Name of service template to use

  host_name                       UPS 1
  service_description             upsBatteryCapacity
  is_volatile                     0
  check_period                    24x7
  max_check_attempts              3
  normal_check_interval           1
  retry_check_interval            1
  contact_groups                  Admins
  notification_interval           0
  notification_period             24x7
  notification_options            c,r
  check_command                   check_snmp!1.3.6.1.4.1.318.1.1.1.2.2.1.0!98!100!communitystring
    }

This service montiors if the UPS battery is 100%. Now the problem is that when i specify thresholds it will obviously give me warning or down alert as nagios only monitors the thresholds. How can i make nagios montior something like, give an alert when battery status falls below < 85% , warning between 85%<W<90% and green when 100%
Thanks


#2

“./check_snmp --help” or search the forum, “lower than” warnings and criticals have already been discussed (quite recently if i remeber right)

Luca
Edited Fri Nov 11 2005, 02:36AM ]


#3

" check_command check_snmp!1.3.6.1.4.1.318.1.1.1.2.2.1.0!98!100!communitystring
} "

We have no way of knowing what your command means or if it’s correct. You have to show us the definition of the command from checkcommand.cfg.
And as luca stated, simply use the --help and it tells you how to set thresholds.


#4

Here is my check_snmp command definition,

‘Check_snmp’ command definition

define command{
command_name check_snmp
command_line $USER1$/check_snmp -H $HOSTADDRESS$ -o $ARG1$ -w $ARG2$ -c $ARG3$ -C $ARG4$
}


#5

From your checkcommand definition and the above check, your:
-w = 98
-c = 100

So what you have defined is, if value returned is above 98 then give a warning and 100 and above, give a critical.
This from ./check_snmp --help "Bare integers are interpreted as upper limits."
I think what you really want is this.
-w 80:85
-c 0:79
Warning if battery capacity value is from 80-85 and critical if a value from 0-79.


#6

Sir, i think i am a bit confused.

This is my service.cfg
define service{
use generic-nix ; Name of service template to use

  host_name                       UPS 1
  service_description             upsBatteryCapacity
  is_volatile                     0
  check_period                    24x7
  max_check_attempts              3
  normal_check_interval           1
  retry_check_interval            1
  contact_groups                  Admins
  notification_interval           0
  notification_period             24x7
  notification_options            c,r
  check_command                   check_snmp!1.3.6.1.4.1.318.1.1.1.2.2.1.0!80:85!0:79!comms
    }

This is my checkcommands.cfg

‘Check_snmp’ command definition

define command{
command_name check_snmp
command_line $USER1$/check_snmp -H $HOSTADDRESS$ -o $ARG1$ -w $ARG2$ -c $ARG3$ -C $ARG4$
}

According to the checkcommand defined, i passed 3 arguments to the services,

-w 80:85
-c 0:79
the battery is 100% still it shows critical


#7

Do i need to include
-o 86:100


#8

Also, ./check_snmp --help does not work. I have also tried --h and ?.
I am using Red hat 9


#9

ok, here’s my result,
[[email protected] libexec]# ./check_snmp
check_snmp: Could not parse arguments
Usage: check_snmp -H <ip_address> -o -w warn_range] -c crit_range]
-C community] -s string] -r regex] -R regexi]
-t timeout] -e retries]
-l label] -u units] -p port-number] -d delimiter]
-D output-delimiter] -m miblist] -P snmp version]
-L seclevel] -U secname] -a authproto] -A authpasswd]
-X privpasswd]

[[email protected] libexec]# ./check_snmp -H 10.1.5.15 -o 1.3.6.1.4.1.318.1.1.1.2.2.1.0 -w 80:90 -c 0:80 -C communitys
SNMP CRITICAL - 100

Even though my range specified is 0 to 80 for critical and 80 - 90 for warning, it is still giving Critical message and showing the result as 100.


#10

I messed up, this from --help.
"-w, --warning=INTEGER_RANGE(s)
Range(s) which will not result in a WARNING status
-c, --critical=INTEGER_RANGE(s)
Range(s) which will not result in a CRITICAL status"
So it’s
-w 91:100
-c 85:90
You will NOT get a warning if range is between 91-100 and you will NOT get a critical if range is between 85-90. Try that.
Oh, any with check_snmp you should ALWAYS use the -m option or else you pc has to search through all of the mib’s to find what it’s looking for. With the -m option, you TELL IT which one to use and the command runs much quicker.


#11

Sir,
i did this but it still gave me service critical, :frowning:

define service{
use generic-nix ; Name of service template to use

  host_name                       UPS 1
  service_description             upsBatteryCapacity
  is_volatile                     0
  check_period                    24x7
  max_check_attempts              3
  normal_check_interval           1
  retry_check_interval            1
  contact_groups                  Admins
  notification_interval           0
  notification_period             24x7
  notification_options            c,r
  check_command                   check_snmp!1.3.6.1.4.1.318.1.1.1.2.2.1.0!91:100!85:90!commS
    }

Then i tried this also, it gave me a warning,
define service{
use generic-nix ; Name of service template to use

  host_name                       UPS 1
  service_description             upsBatteryCapacity
  is_volatile                     0
  check_period                    24x7
  max_check_attempts              3
  normal_check_interval           1
  retry_check_interval            1
  contact_groups                  Admins
  notification_interval           0
  notification_period             24x7
  notification_options            c,r
  check_command                   check_snmp!1.3.6.1.4.1.318.1.1.1.2.2.1.0!85:90!91:100!mr76c0m
    }

#12

It looks like the first definition would work if you change the critical range to 85:100 instead of 85:90.

From ./check_snmp -h-
Ranges are inclusive and are indicated with colons. When specified as
’min:max’ a STATE_OK will be returned if the result is within the indicated
range or is equal to the upper or lower bound. A non-OK state will be
returned if the result is outside the specified range.

So with your 85:90 critical range you will get a critical error when the batteries are charged above 90%.


#13
  • If specified in the order ‘max:min’ a non-OK state will be returned if the result is within the (inclusive) range.

Thats more like what we are after, just put the figures in reverse !

In my case:

-w 3999:3000 -c 2999:0000

Edited Thu Dec 22 2005, 11:57AM ]