Switches are another of my favorite topics, so I’m going to elaborate a bit. This is how I deal with your trouble of “redundant paths to the same switch”, which is exactly how our network is setup also.
First get yourself a couple tools. i.e. Mbrowse is an snmp mib browser at kill-9.org/mbrowse/ and Java Device Manager (Java Device Manager v5.7.9.0 for BayStack 5510 10/100/1000 Switch v4.0.0 and v4.0.1) is a Nortel switch configuration tool available from nortel website. You will use mbrowse to look at a switch/router/any snmp enabled device and view the vast amount of information that is available to you. You can use Device Manager to also view status of your switches. Anything that you can see in the device manager can also be viewed by using mrowse.
The Nagios plugin that you will be using is check_snmp ONLY.
My Nagios PC is connected to a switch (switch A) via it’s eth0 port. I want to show this port on my status map and also it’s status. The status of this port has a oid of “.1.3.6.1.2.1.2.2.1.8” and the expected value configured in nagios is “1” which means the port is up. Do not use check_ifoperstatus (it takes longer than check_snmp. So the nagios check command looks like this:
check_snmp!public!.1.3.6.1.2.1.2.2.1.8!1!ifOperStatus!RFC1213-MIB
The check_snmp command definition looks like this:
command_line $USER1$/check_snmp -H $HOSTADDRESS$ -C $ARG1$ -o $ARG2$ -r $ARG3$ -l $ARG4$ -m $ARG5$
If you don’t use the -m switch, then the command takes a huge amount of time, since it will then search through every single mib installed on your Nagios pc in /usr/share/snmp/mibs/, so make sure you have them installed.
Now that we know what the status of the NIC is, we want to know the status of the IF that is on the switch, so use the same command, but change the oid to the correct one for that port on the switch. When you config the hosts.cfg, make the parent for the switch port “eth0” and the switch itself (use fping for the switch itself) has a parent of the switch port. YES, each one is an individual host and here is why. That way, when you look at the status map, you will see a line from Nagios, to the eth0 NIC, a line to the switch port and then a line to the switch. Anyone can now remove all the cables from your pc and the switch, and you will know exactly how to cable it back up, just from your NAGIOS setup (how cool is that?).
Now on to the redundant part of the network. The switch is connected to another switch, which is connected to the router. We want to show the status of these connections, and we want to show every port involved (as an individual host, for the same reason as above, cool huh?)
Switch A connects to Switch B via 2 fiber connections (or in your case, maybe 2 copper ports). We use spanning tree to block the one fiber connection so that there is no network loop. You could also connect them by using a multilink trunk. We want to show that A is connected to B with 2 cables, so create 5 more host.cfg entries i.e. SwitchAPort23, switchAPort24, switchBport1, switchBport2 and SwitchB.
config services.cfg to check the “ifstatus” of each port, using the correct oid for each port. For the switch itself, simply use check_ping or check_fping for the service. “Obviously the switch must be up if the port is up, so why are you bothering to ping the switch every 5 minutes” you ask? Well, I want to show on the status map, every important connection, every switch. That way, someone can rip all the cables out of our entire network, and I can cable it backup identically to the way it was.
Status map will now look like this:
Nagios–eth0–SwitchAPort1–SwitchA–SwitchAport23–SwitchBport1–SwitchB
REMREMREMREMREMREMREMR–SwitchAport24–SwitchBport2
Parents for the hosts.cfg go like this:
eth0 no parent (so it default’s to the Nagios process)
SwitchAport1 parent is eth0
SwitchA parent is SwitchAport1
SwitchAport23 parent is SwitchA
SwitchAport24 parent is SwitchA
SwitchBport1 parent is SwitchAport23
SwitchBport2 parent is SwitchAport24
SwitchB parent is SwitchBport1,SwitchBport2
You should be able to see the power in this. How many people have a complete diagram of there network all laid out in Nagios? How many people could remove every cable from there network, and then recreate it from there Nagios “status map”? Not many I’m sure.
Now that you have all of these ports and switches configured, you can move on to more details about your switches. If you are using spanning tree to block one path (as we are) we want to know if that path ever changes, so we use check_snmp and query the STP spanning tree status to see if it’s blocking or forwarding. If the port is not in it’s “normal” status, then we get a warning from nagios, that the port has changed stp from blocking to forwarding, or visa versa.
You could add check_snmp to see the status of the 3 fans in each switch and the powersupply status. On your routers you can use check_snmp for the fans, redundant power supplies, cpu usage, and stp changes (spanning tree topology changes). If you have vlans on your switches, you want to show them on your status map, so when you look at it, it makes some sense as to why hostA is connected to vlan-Business and hostB is connected to vlan-accounting. Use VlanROWStatus to get status of vlan. On a switch, you can actually have a vlan configured, but not enabled, so this gives you a method of how to show it in nagios status map.
Put as much details about your network connectivity as you can, and I will guarantee, that you boss will be amazed that someone finally has a network diagram. Not only that, but it tells you when it’s broken. Nagios will now show you which cable the electricians cut, or which cable some tech unplugged by mistake, since the status map will show everything “OK” up to the break, then the rest will be “unreachable” If you have setup your hosts.cfg file as I have described, the only device that will show “DOWN” is the first device in your network chain that will not “check_snmp”. If you have configured your notifications correctly, the only one you will get will be the “DOWN” one, since I don’t care about any “Unreachable” errors.
Edited Thu May 19 2005, 07:45AM ]