Hi everyone.
We have been using Nagios for about a year now for our production environment. Primarily we use this to monitor URL’s for our web services. One thing I have noticed is we tend to get some false alarms, but in a weird fashion. We check the service every 2 minutes, and alert after 5 criticals. On many occasions we will get a fail message, and then on the 6th try it will recover. I have noticed in a few instances that we have had some dns timeouts from our hosting provider, but it seems very odd that on multiple occasions this happens. I would say 20% or so of our alerts are of this nature, but I’m not sure how to investigate. The service is up and running when I check manually from another source. Could there be sockets being held open and choking the new requests? I was just wondering if anyone experienced something similar.