"we had some trouble with our network and received no notifications"
Yes, that can be a problem. But if you could have logged into the nagios machine and looked at the nagios website of yours, then you should have been able to tell what is broken on your network . Oh, you aren’t monitoring your network equipment you say? You say you are monitoring hosts and services though right? And those hosts and services aren’t worth a dime, if they have no net to run on. So, I’d suggest, that you spend some quality time tracing cables/switches/routers/if cards and make your “status map” look IDENTICAL to the way your network is wired up PHYSICALLY.
Now that you have done that, your nagios machine will show that HOST “router port e52” is DOWN and all HOSTS after that will show as “Unreachable”. Since i don’t allow any unreachable alerts, I get only ONE alert and that is “DOWN”. That is how you get rid of those nagios spams into your “INBOX”. The only reason you get so many alerts, is because you have no dependancy setups or parent/child relationships.
Bottom line is this. Nagios isn’t worth a dime, if you are not watching those interface cards, switch port that nagios is plugged into, and the rest of the network on the path to the ACTUAL service that you are very very interested in.
For example, the only thing I might want nagios for is to watch the status of httpd on one of our servers. If the net goes down, then I’m going to see that the service check failed and the host failed. But if I have the network mapped out, and nagios is checking every connection from my nagios box, all the way to the web server, then if the net goes down, I’ll see that the httpd box is in state “unreachable” and that switch port 16 interface status is “DOWN”. Since port 16 is the port that the web server is plugged into, I look and find that the cable is unplugged. Net is back up in less than 10 minutes.
The above example is not a far fetched thing. That is why I installed nagios in the first place. Our network went down, and we spent 45 minutes, ping this, ping that, then finally we find a likely problem, and unplug and plug back in a cable plugged into a switch, and problem solved( poor connection in switch port or dirty contacts).
So now, I can fix the same problem in 10 minutes tops.