Hi,
I have a simple setup running Nagios 2.3.1 (actually NagiosVMA, configured using Groundwork Monarch). I have a host with many services on it (both remote checks performed using check_by_ssh and checks for various public services), and I get a barrage of notifications whenever something goes wrong.
So, I defined dependencies. The services in a host are dependent on the host (and/or on the PING service, and some other general services). But, there are some timing issues that mean I still get hammered with a lot of alerts. What happens during outages it this:
- The host itself goes down, but not before a few services manage to go down individually (so I get a few service down notifications)
- More services that go down when the host is down do not generate any notifications - good.
- When the outage is over, Nagios detects the host is up.
- Now the services are detected as going up, and since the host is up there is no dependency to filter the notifications, so I get a whole barrage of “service OK” notifications.
These are killing my cellphone :(. Can anyone direct me to the best way to better configure these things?
Thanks!