SNMP and Switch Ports as hosts


#1

Well, I’ve read jakkedup’s posts on the subject and am attempting to setup Nagios to show my switch ports. I currently have Nagios working and working well. My question is with SNMP. It is working and I can check the ifOperStatus of each port by hand. My question is, how do you set it up as a service? Mainly, I don’t want to have to setup the same service 24 times for 24 different ports cause the OID changes for each one (the last number in the OID is the number of the port).

I’m sure I’m missing something with templates and what not but have searched and can’t find what I’m looking for. This could be that I’m not choosing the right search terms as well.

I’m wanting to get my active box setup nicely before setting up my central server for distributed monitoring although that is coming in the next few weeks.

Thanks!


#2

24 ports? Why 24? It seems to me, that you should only be concerned about the network and any major hosts/servers that are online 24/7. If in fact, you have a switch with all 24 ports being used as trunks and for important servers(24/7) then ok. But not for some user host that is powered up/down several times/day.

Anyway, yes, use the template directive and only define directives in the template that will be used for every single service. You will have to define a service for each and every host(port 1-24). Each service will use the default template and will have 2 or 3 lines that will be additional.
Like this.

[code]define service{
name remote-nagios ; The ‘name’ of this service template, referenced in other service definitions
contact_groups Network
passive_checks_enabled 1 ; Passive service checks are enabledpaccepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 0 ; We should obsess over this service (if necessary)
freshness_threshold 1200 ; Default is 0 which automatically determines threshold
notifications_enabled 0 ; Service notifications are enabled
event_handler_enabled 0 ; Service event handler is enabled
flap_detection_enabled 0 ; Flap detection is enabled
process_perf_data 0 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0
check_period 24x7
normal_check_interval 5
notification_interval 120
notification_period 24x7
notification_options w,c,r
service_description fping
servicegroups fping
check_command service_is_stale
active_checks_enabled 0
check_freshness 1
max_check_attempts 1
retry_check_interval 5
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}

define service{
use remote-nagios
host_name SwitchA-1p23
service_description IFSTATUS-1p23

check_command check_snmp!public!.1.3.6.1.2.1.2.2.1.8.23!1!ifOperStatus!RFC1213-MIB

    servicegroups           snmp_checks
    process_perf_data       0
    }[/code]

So, there is really no way around it. I have to have a few of these that are unique, like:
host_name
service_description
check_command #NOTE, my command is commented out, since the check command is being ran remotely and output sent via nsca.
servicegroups #may or may not have to be unique
process_perf_data may or may not have to be unique

So, at the least, you have to have 4 directives defined for each service.
use
host_name
service_description
check_command #depends on if it’s a passive or active check

FYI, it’s not the same service, so get that out of your head. Each port on the switch that is important enough to monitor, is important enough to have a service defined for that port only. Remember, this port on the switch is actually a nagios hostname. So that one hostname can have many services for it. i.e. traffic_in_out, ifOperstatus, stp_status(spanning tree status) duplex, etc, etc,

Get fancy too, if you have vlans or mlt’s setup.
For example, in your status map/hosts.cfg
switch-A connects to VLAN-A which connects to switch-A-1p24 and also switch-A-2p24.
switch-A-1p24 connects to server-A-eth0
switch-A-2p24 connects to Router-A-if4
server-A-eth0 connects to Server-A
Router-A-if4 connects to Router-A

Now, your status map looks great. Go ahead and unplug all the cables from that section of network, and from your nagios status map, you can plug everything back in, just like it was before.

PS:, watch out, some switches don’t have the ports numbered 1-whatever, especially when it’s a stack of switches.
So, for a 2 unit stack you might think they will be 1-48, but I’ll bet they are not 1-48. Double check by using your mib browser (mbrowse).


#3

Yeah, I understand that I don’t need all 24 ports on a switch watched, but I will have quite a few since we have over 100 servers that make us money.

I have the VLAN’s setup in NAGIOS and have it setup that way so I know what machines are on what VLAN on what switch so my statusmap is nice but I know it can be nicer. I’m sure you know what I mean. :slight_smile:

I don’t have any Stacked switches but I’m using my mib browser to determine what ports are what. I figure I will start with my critical servers (DNS, AD, etc) and then move to my live web servers.

I really need to get NSCA setup and working as well, mainly for me though.

Thanks for the explaination!


#4

Yup, if it’s a 24/7 server, then I would map it out completely too.
serverA > serverA-eth0 > switchA-1p3 > switchA > switchA-1p24 > routerAintf-1 > routerA and so on, all the way to your nagios server’s eth0 card.
Sounds like you are on the right track. I know that it’s alot of configuring to do, in services.cfg, but hey, it’s worth it. My services.cfg is 11378 Lines


#5

A couple of other quick q’s. When you define the switch port as a host, what do you use for the Address? Just the management IP of the switch? That is how I’ve done it but was curious if that is the best way.

Also, and this is just me trying to be pretty, what icon did you use for the switch ports in your status map?

Thanks!


#6

The IP to use, would be the IP that is needed to query the device for snmp information. If that IP is not available, then the service check is going to fail, and also the host check is going to fail. Not much else we can do, since the switch no longer responds to ping nor snmp checks, so we have to mark it down at that point. Even if the switch was still operational from a user standpoint, it is no longer managable, so should be fixed/rebooted.

One thing that I did for a few switches, is to take a screenshot using ksnapshot, of the switch as viewed with the manufactures gui tool. The gui tool, shows the switch as if you where standing in front of it, so it’s a nice pic. Then I shrunk it down. It looks OK, but could be alot better. I’ve been meaning to read about how to make nagios icons, but never got around to it yet. I’m sure there are rules to follow.


#7

I found a solution to make only one service for each switch.

I created one host for each port.
But in the *address *directive I put only the **last **value of the corresponding OID( 1 for the first port, 2 for the second,…)
Then I add all these host to a hostgroup.

Then for the check_command of the service, I put *check_snmp!IP of the switch
*
In the command.cfg file I modified the check_snmp command to take $ARG1$ for the -H attribute and ifOperStatus.$ADDRESS$

I’m a nw user of nagios, so is it correct?

And how to make nagios this host down and show it in red in the status map?
Actually it’s better to declare these ports as a service I think, just because you won’t have multiple host in the status map and each port will be associaated to the switch(host).


#8

"And how to make nagios this host down and show it in red in the status map? "
You would have to have a “host check” that is going to fail. I thought of that, and there really isn’t much you can do about it. Think of it this way.
These ports are shown as “hosts” on the status map the same way an ftp server would be shown as a host right? On the status map, an ftp server can be shown as UP/green but the service check on that host is failing(ftp check). Same applies to these ports. The actual “HOST” that this port is on, is the switch. The switch is operational, but the port may be down. Remember, this host “switch port” may contain service checks that are NOT failing. A port can have many different service checks, not just ipoperstatus.

Bottom line is, the status map is just that. A map of how this connects to that, not just switchA connects to SwitchB is some mysterious fashion. That is ALL that I can see the map is good for.


#9

“Actually it’s better to declare these ports as a service I think, just because you won’t have multiple host in the status map and each port will be associaated to the switch(host).”

I suppose… but, if the host is the switch itself and the ports are simply service checks for that switch, then on the status map, you will be shown the switch and connected to the switch will be, let’s say, 10 ftp servers. But how do these ftp servers connect to the switch? Which port/cable do I have to replace, since ftp server#3 is shown as down, but yet it really isn’t, only the cable connection on the switch has gone bad. Which cable do I replace or reseat?

I’ve seen it many times happen just that way. So, during production, you are tracing the cable out of ftp server#3, to the patch panel port#625. Patch panel goes to some other room. Now trace patch #625 through to spaghetti and find that it plugs into port 4 of some switch. To dang hard and too much time wasted during production.

With nagios setup to show ports as hosts, then you can see from the “service problems” page that there is a problem with ftp server#3 and the host “switchA-port4”. I go to the switch, replace the cable or reseat it, or find that someone has unplugged it, and back online in minutes.


#10

I prefer having your switch as a host, and then having a service per switch port. The last couple numbers in your OID correspond to what switch port it is, so you can have a quick look at your services.cfg and easily see what switch port might be down/unplugged.


#11

It’s a matter of preference I suppose. But it might help if you look at it this way. I know for a fact, that we don’t have schematics on our entire network here. So, how does switch A connect to switchB? From the way you are doing it “MP”, you wont know nor will anyone else who looks at the status map. All they will see is that A connects to B. If the cable running crew screws up, and manages to bump into your fiber cables and yanks the fibers out, can any tech now plug them back in correctly withing 5 minutes? Nope, don’t think so. They will have to figure it out.

If you setup nagios to be a schematic drawing of your network, as I have, they can look at the nagios status map, and easily see that switchA port 23 connects to switchB port 1, switchA port 24 connects to switch c port 2, and so on. There is no guess work involved and no time lost in reinventing the wheel. As a matter of fact, you could remove every single trunk cable from our entire network, and I will have a status map, that can be used to cable it back up identicall to the way it was. Can you say that true with your network and will just ANY tech, even ones who don’t know the network, be able to do it. I seriously doubt they could, from the information shown on your status map.

Connecting switch A to switchB just doesn’t show enough information. I suppose if that is all you want… But why not take that one extra step, and make a super accurate schematic drawing too.


#12

PS, you are going to be checking the status of each port anyway. Why not just take that extra step and go ahead and make each one a host?


#13

Sounds good and also a bit like a lot of work. How many hosts and services you got now? And how many physical switchesare you monitoring? What kind of switches and routers do you check, Cisco, Nortel? I’m looking for Nortel plugins!


#14

Nortel/cisco/etc makes no difference. You are simply executing the check_snmp plugin and getting whatever info you can find out about a device. I was amazed at how much info you can get out of both nortel and cisco. Fan speed, temp of case, temp of fan exhaust, fan rpm, redundant power supply status, if it likes extra cheese on it’s pizza, you name it, it’s there. So I monitor via the check_snmp plugin most everthing that is important like power/fans/interface status, spanning tree blocking/forwarding, topology changes, and on and on… Just one plugin, using a different oid number is all…

I’m checking every switch/router and the ports used to provide the network backbone, plus the ONE PORT used for the mission critical device(server) plus the server services. Right now, that’s over 1200 checks. Plus my favorite, Oracle tablespace free checks, plus graphs. Don’t have a count of actual switches/routers. Somewhere around 150.


#15

Oh, and yes, since we have no schematics on how the network is setup, previously, the company relied on the one or 2 people that knew. By installing nagios and setting it up the way I did, I have now physically created a living schematic and that gets updated when we change things around. It has also enabled me to become the guy who knows just what is where and how this stuff all works. Side benefit is I’m now a senior systems technician, the highest scale we have in this department. Otherwise, I would still be fixing printers and pc mice…


#16

I see where you’re coming from, but even if you know right away what device is supposed to plug into what port, you’d still have to have proper physical labeling to know which handful of wires connects to what. I’m lucky enough to take care of a discontiguous network with elements separated by the internet, it’s a lot more server based than network gear. All i need from switchports are bandwidth statistics. trunks are rare, and a typical setup is just two vlans along with static and dhcp’d hosts so which port where doesn’t matter a whole lot. It’s still all well documented though.

That being said, i am going to use your example if i am ever administering a network where it would make sense (large office, college network, etc), as it’s a good idea, especially when documentation is lackluster (or not kept up to date when changes are made)


#17

I do like this idea most of all.
However Id like to ask. Is there a way to check only the change of state on all the ports of my mgmnt switch, because not all the ports got something plugged in yet.. I need to know if some1 plug-in or plug-out cable into my managment switch. Lets say I have 1 switch defined as a Host and every port is different service (e.g. 24 port switch = 24 services).
I just want a state change notification. Not a critical alerts. How would I do that… can some1 help or give some ideas ?
All appreciated