Hello,
I’m trying to make sense of the output I am receiving from my SNMP checks. My goal is to set thresholds for CPU Load and set up alerts when the thresholds are exceeded. I’m using ./check_snmp to poll our cisco devices.
I’m getting this to work and display properly within the Nagios Web Interface meaning that it’s displaying the output from the ./check_snmp command which basically runs a simple snmpget command.
Here is my checkcommand.cfg:
$USER1$/check_snmp -H $HOSTADDRESS$ -C (passwd) -o (oid) -w (warning) -c (critical)
Here is my service check:
./check_snmp!passwd!oid!100!120
Now, this checks the Cisco device for the CPU load and outputs an Integer value. This integer value is where I am confused. Before I can specify a threshold that makes sense, I will need to know what this Integer represents.
Can anyone help me understand where to find out the meaning of the integer output for my CPU load check using check_snmp? Thanks
It just might be the same as what you get when you type
uptime
at a linux console.
It represents the system load averages for the past 1, 5, and 15 minutes.
To make it easy on yourself, if you don’t want to get too detailed about it, a 1 min load average above 2 might mean that your CPU is terribly busy.
You can search around the net for a better explanation if you wish, but I’d suggest to simply work with the numbers and then setup a decent threshold, based on what you feel is acceptable for these devices.
For example, the box I’m on now, is banging out 1000’s of service checks with nagios, and this is what I get for a load.
load average: 1.37, 1.04, 0.83
Yes that makes sense for CPU load for a linux server. But this is a cisco device and therefore the integers that I am getting are different.
For example, on one of our linux servers, the load average is:
0.67,0.72,0.78
Now that is being monitored with Nagios. Now here is the output from the ./check_snmp command for my Cisco device:
SNMP OK - 97
Now that is one of the router’s we are checking. Another will display an output like this:
SNMP OK - 124
So, as you can see, since I don’t know what these integers mean, I can’t specify a threshold that makes sense for the cisco device.
Does this mean that I will have to dig into the MIB of the object that I am moniroting to find the answer to the value I am getting? And would I find that info from the MIB documentation on the Cisco router itself?
[quote=“tekhed”]
Does this mean that I will have to dig into the MIB of the object that I am moniroting to find the answer to the value I am getting? And would I find that info from the MIB documentation on the Cisco router itself? [/quote]
I’m surprised that you even know the oid, if you don’t know what the output is supposed to mean. I use “mrowse”. It’s a MIB browser and you can look through all the mib’s and see what the oid #'s are, what the english name is, what all of the results of a query will return. If it’s a numeric return, then it details what each number would mean, i.e. 2 means forwaring 5 means blocking for a certain oid that queries spanning tree state. You could also just read the mib being used and it will tell you what the results mean.
I don’t know what OID you are using, so i can’t tell you what the output represents. And I doubt that I have the MIB, but I might.
Run the command from a command line by hand. Now run it again, but this time using the -m switch and specify the correct mib to use. You will get ouput from the command much quicker, since it didn’t have to search the list of mibs on your linux box to find the right one.
yes I can see how important that would be with that many snmp checks. However, I am only going to be issuing a few snmp checks with my nagios.
Back to my original question, I’m on the cisco site right now looking for the MIB documentation for our cisco routers. There is so much info in those docs and they all seem to change slightly for the different IOS releases.
Question: Where would I find specific information that tells me what the integer value is that I am getting when the ./check_snmp returns looking at the following OID?
1.3.6.1.4.1.9.2.1.58.0
I’m currently logged on to the Cisco site, I’ve searched out the MIB documents for our Cisco routers with the right IOS release. I have used some of the tools available on the cisco site that allows me to search for a specific MIB or OID which will give me some description of that MIB/OID.
Problem is that it still doesn’t give me any details of the integer value that I need. It tells me what the object name is, what the OID value is, what the MIB name is but not any specifics of the Integer value that is given when the OID is polled. . . : /
I don’t have that mib, but if you can find the mib, please attach it in your next post, so I can read it with my mib browser.
Or you could download and install mbrowse yourself and just read it for yourself.
If you know what the MIB is, you can also, just view it and read it.
For example, if I wanted to know what oid .1.3.6.1.2.1.2.2.1.1 was for and what values it returns, just
view /usr/share/snmp/mibs/RFC1213.MIB
looking at the top, I see this is MIB-II and that equals .1.3.6.1.2.1
.1.3.6.1.2.1.2 would be interfaces according to the text file.
.1.3.6.1.2.1.2.2 would be ifTable
.1.3.6.1.2.1.2.2.1 would be IfEntry
.1.3.6.1.2.1.2.2.1.1 finally, would be ifIndex OBJECT-TYPE
SYNTAX INTEGER
ACCESS read-only
STATUS mandatory
DESCRIPTION
"A unique value for each interface. Its value
ranges between 1 and the value of ifNumber. The
value for each interface must remain constant at
least from one re-initialization of the entity’s
network management system to the next re-
initialization."
So there you have it. It’s in plain english and a text file a description of my oid and what value I could expect. But I just can’t tell you about yours, since I don’t have that oid definition anywhere or the MIB.
But like I said, if you wanna attach it, it’s only a small txt file.
OK, I think I found the answer to the question. First, I had to step back and take a look at exactly what I was polling. Here is another OID that I am getting data from: 1.3.6.1.4.1.9.2.2.1.1.24
So, if you go to Cisco, you can search out either the specific MIB or OID that you are looking for and it will tell you the object name of the OID you are polling.
Then, on the cisco site, you can search for that object name or google it to find that specific object and it’s relation to the value that it returns.
I was taken to the cisco site which describes the different objects and their corresponding values. Here is the one for the OID that I am polling which is locIfLoad:
“Provides the loading factor of the interface. The load on the interface is calculated as an exponential average over 5 minutes and expressed as a fraction of 255 (255/255 is completely saturated). Used by Interior Gateway Routing Protocol (IGRP)”
Anyway, that appears to be the value of the integer that I am getting when I issue the ./check_snmp command against that OID. ; p
OK, I think I found the answer to the question. First, I had to step back and take a look at exactly what I was polling. Here is another OID that I am getting data from: 1.3.6.1.4.1.9.2.2.1.1.24
So, if you go to Cisco, you can search out either the specific MIB or OID that you are looking for and it will tell you the object name of the OID you are polling.
Then, on the cisco site, you can search for that object name or google it to find that specific object and it’s relation to the value that it returns.
I was taken to the cisco site which describes the different objects and their corresponding values. Here is the one for the OID that I am polling which is locIfLoad:
“Provides the loading factor of the interface. The load on the interface is calculated as an exponential average over 5 minutes and expressed as a fraction of 255 (255/255 is completely saturated). Used by Interior Gateway Routing Protocol (IGRP)”
Anyway, that appears to be the value of the integer that I am getting when I issue the ./check_snmp command against that OID. ; p