Nagios Version 2.0
State stalking is exactly what I need to solve a specific problem. However, I would like it to clear acknowledgements when the stalking state changes.
For example…we are using check_ifstatus to return downed ports on a particular switch.
If hosts go down on this switch, we want to be able to acknowledge that we see the problem, but if we do that, then next port that goes down, the device won’t display on the problem screen.
How do I get it to reset the acknowledgement after the critical state text changes to different critical state text?
Maybe that isn’t possible currently.
Would this be difficult to add?
Mike B.
Split this into single checks for each port, instead of walking the entire tree with one command.
That’s an interesting idea.
Does that mean I would have 24 service checks for this switch?
Mike B.
Yes, but I fail to see why you would want to know the status of every port. Surely, not every port has an important host on it. If these are pcs that are going up/down every day, then why even monitor them at all. If they are 24/7 production critical servers, then yes, monitor the port that they connect to.
They are actually other switches and routers. The biggest problem here is that the link between them is not huge. Having that many checks, although not huge, eats up precious bandwidth.
Having only one check limits this problem. So allowing a reset of the acknowledgement only when using the stalking state is a reasonable solution.
Mike B.
I just went ahead and patched base/checks.c with:
temp_service->problem_has_been_acknowledged=FALSE;
temp_service->acknowledgement_type=ACKNOWLEDGEMENT_NONE;
Right after the check for stalk_on_critical.
Works fine for me. Am I missing some unintented consequences?
Mike B.
I forgot to add. I think Nagios rocks!
Mike B.