We just set up Nagios a few weeks ago. Today the connection between Nagios and the internet was broken for about 7 minutes. After that, Nagios sent out 118 Emails and SMS Pages to tell us everything was down, then another 118 Emails and SMS Pages to tell us everything(118 servers) was up.
Is there a way to get Nagios to send out one mail when multiple hosts and services go down at once like… “Hey LOTS of stuff just went down” instead of: “Server went down” 118 times and “Server came up” 118 times? I searched for this and could find nothing. I found lots of people misunderstanding the question and suggesting “Flap Detection” but nothing actually related to “Everything went down at once.”
Any suggestions, work-arounds, scripts, or ideas about this?
If you use the “parent” definition in the host definition to define which other hosts the defintion is dependent on, Nagios will check those hosts first. This requires you to change in host definition, but has the advantage that the status mapp will reflect the topoloogy.
Another way to do this is to create a file that contains nothing but host_dependency definitions. So you could list a router and then all hosts directly connected to it. If “inherits_parent” is “1”, Nagios will check the entire path.
For example, image you have router1->router2->VirtualHost->(guest1|guest2|guest3)