Notification Threshold when connection times out?


#1

We have a problem that frequently pops up in our monitoring environment. Our monitoring server has adequate bandwidth but we share with an occasional bandwidth hog. Sometimes they steal the bulk of our available bandwidth causing Nagios to be unable to adequately reach the systems it monitors. What results is an onslaught of hundreds and thousands of pages to everyone telling us that systems and services are down… one page for each that is monitored! Sometimes more if the bandwidth is pinched for a prolonged period of time. Legitimate pages are lost in the process.

Is there a way to put a threshold on Nagios if “Time Out” messages are received from a certain number of hosts or services? If 4 distinct sites are timing out at the same time it should stop all notifications for all hosts and services and send a single page indicating a bandwidth issue. Is there any facility in Nagios that even slightly resembles what I’ve described? TIA!


#2

You could create one service that checks the bandwidth and make all other services dependent on it.
Also you could specify timeout values for each service command. Try to run each service form the terminal with the --help switch and see how to set up timeout values on each of them.