No need to get all worked up. I’m still very new to Nagios, the company I’m a trainee at uses it. I only want to learn more about it, and perhaps try to improve on how it works for them now.
300 switches is not directly connected to the router, they’re connected via alot of other switches. Sorry if I confused things. And when I look at the “Status map” I can see a hierarchic view of the network overlay, which closely resembles how the network looks like out in the field.
Now the problem is this: IF one of the major switches, or the overlaying router goes down Nagios will send MANY e-mails. Resulting in alot of SMS. Therefore the company has disabled the checks of the “non-vital” switches, and the on-call person must check Nagios every hour or so to manually check if the shit is still alive.
What would really improve things would be IF we could enable the check for ALL devices in the network. BUT if > 10 devices dies at the same time, Nagios only sends a FIXED amount of e-mails. Can this be done??? And if so, plz tell me how or guide me to a link.
Thanx for the feed-back, even though I didn’t grasp how it’ll fix my problem.
/Gymmarn - Nagios noob
[quote=“jakkedup”]yes luca, it does work.
If you have 300 switches all connected to one router I wanna see this router. Never seen one with that many ports.
But if what you have is 5 networks connected to one router, and each network has ONE port on ONE switch connected to it, then surely you can specify in the hosts.cfg who the parent is.
Once you have your entire network layed out, and the parents all defined, your status map will look like the exact way your network is cabled up.
And if you have 500 switches connected to that router, please post a pic of it.
If you do not care to spend the time on defining what switch connects to what switch and who the parent host is for each of your switches, then there is NOTHING that nagios can do for you to decrease your messages to just ONE. Nagios will not know which one is the actual failure point and neither will you.
So you are telling me, that you have all this monitoring going on, but nagios isn’t telling you anthing except that your entire network is down.
But in fact, all that is down, is the cable in the back of your nagios server??? Well, then you didn’t use the parent feature of nagios very well. Your nagios pc process connects to the eth0 port of your pc, so show that by defining a eth0 check and give it no parent so it default’s to the nagios process.
Now eth0 connects to switch1/port10 so make a host called sw1port10 and make an snmp ifoperstatus check of that port. Make it’s parent the eth0 host… and so on. When you are done, it will look like my nagios, and show every connection from my pc to the switch to the router, to the 5 networks, to every switch in the building and EXACTLY how they are connected.
I know darn well that you don’t have 500 cables plugged into that router.