I’ve got strange problem with new Nagios 2.5 and big failure of hosts…
I’m monitoring about 400 of services and about 200 of hosts, and
few days ago I had big failure (about 100 services). The problem is,
that the statuses were updated very slowly… after 30 minutes
I had about 15 notifications with critical status.
Of course every service is check every 5 minutes, but during
failure I could see in some service I knew is not working
something like this
Next Scheduled Active Check: 11-21-2006 09:42:31
Latency: 252.565 seconds
and it was 09:56 during checking… why this service
wasn’t checked? Or more truly - why nagios didn’t checked it
as “critical”, becouse I belive it was checked.
Now I’m trying to reproduce the problem with iptables…
I have nagios 2.5, I cut off some part of network with
iptables -I OUTPUT -d dest_network/24 -j DROP
which cuts off about 100 of hosts (and 100 of PING service).
After 25 minutes I had only 10 hosts recognized as wrong,
the situation with “next scheduled active check” occured again.
Of course no DNS service, gateway etc is not in this network.
The very strange is, that when I disable blocking with
iptables -D OUTPUT -d dest_network/24 -j DROP
suddenly 70 new critical statuses were in the nagios www page…
Why is that? If I disable few (2-5) of services - everything
works fine, I have quick information. But if I disable many
(tens, hundreds) of service - everything is veeery slow.
Below is my nagios.cfg config (not all, without spam informations)
I would be gratefull for any help…