Hi,
Hope everyone had a good new year - now back to the work…
MarkJ wrote: …
…
If do this this, what happens if the passive check results don’t get back to the nagios server (ie a WAN interuption) ? Will i loose the alert ?
Jakkedup Wrote:…
The alert is coming from your nagios server when you are using a passive check such as this. If your network is messed up, then you won’t be getting any status updates to the nagios server with ANY passive checks. Shouldn’t you be monitoring the network devices too? i.e. switches, spanning tree, routers, etc?
Yes, we are monitoring all the network switches and VPN’s etc for failures. What I want to avoid, is a brief network issue from preventing other issues being identified and looked at. I guess the way to do this is to collect all the Nagios passive checks into a single location on each site and get these sent back to the single site. I guess I should see if NC_Net retrys sending passive results if the 1st attempt fails…
MarkJ wrote: …
…
Anyway, at the moment the passive check is working and when an error appears in the event log, nagios show me a Critical and I get an email.
Jakkedup Wrote:…
Great, that is just how it should work.
MarkJ wrote: …
…
However once the error drops out the bottom of the log, nagios clears the alert back to OK. Should this not, or can I not get it to persist until I manually acknowledge the problem ?
Jakkedup Wrote:…
Again, that would be the worst thing you could want. If it worked that way, then nagios would suspend making any checks of your log and would not email you of any NEW problems, until someone cliked the box “Acknowledged”. Absolutely not a good idea.
[Agreed - markj]
Jakkedup Wrote:…
I don’t see what the problem is. Nagios is getting passive check results from a remote machine that looks at a log file. If a problem is found, nagios alerts you with an email and sets the status to “critical”. Upon the next check in 5 minutes, the log check passes with not trouble and the status is set to “OK” Since it’s a volotile service, even if the check is found to be “critical” again, you will again get an email.
I guess my issue here is partly due to the signal to noise ration in my inbox. If i have a Active Directory replication problem over the weekend I dont want to get a few hundred emails about it. I would like to be able to switch of notifications over the weekend (for some servers) and be able to check the nagios status page to see whats broke.
Jakkedup Wrote:…
I have many log file checks and they all behave that way. Nobody in there right mind is going to be looking at the nagios website for errors such as this. This type of service check is transient and the ONLY way to handle it is with the email notification.
Perhaps the problem is that I’m not in my right mind! Specific services running on specific hosts I can check - ie that my SMTP server is working or that DNS responds. What I’m looking to catch is any errors in the log (filtering out the chaff) and for Nagios to display that something was not quite right ie that my Active Sirectory was decided to stop syncing (it appears to only tell you this once). In other words I would like some visual clue in Nagios that someone should investigate the log in case we have a problem - this is an issue that won’t clear itself, but will stop complaining in the log.
Jakkedup Wrote:…
Again, if email is down, or the network is down, then the least of your troubles is some logfile check. You have much bigger problems at that point and should have been checking the network with nagios so you would know exactly what is broken, or what network drop cable is unplugged.
True, however some of these issues (ie VPN’s going down) resolve themselves by the time I wake up - I still think I want to be able to see whats broken with a quick look at a status webpage rather than trawling through my inbox.
It can’t just be me that thinks it’s easier to have a quick look at a webpage than find the important emails in amongst the spam in my inbox ?
Thanks for your help,
Mark