i’m getting pretty annoyed by the fact that nagios is serving different states whenever I refresh my master-GUI. If I for instance look at a certain service result it can be CRITICAL at one moment and when I hit refresh it will be OK (and vice versa). When I look at the duration of the check result it can easily say that it has been CRITICAL for a few hours, but when I hit refresh it’ll say that it has been OK. Can someone explain to me how this can occur?
Here are some specifications of my setup :
hosts : 1000+
services : 4000+
we have one master receiving checks from 6 different slaves through nsca the master itself doesnt do any checks it only processes the passive results. Can’t see anything unusual in the event or debug log, if we scale down the number of results the problem doesn’t occur.
Luca thanks for your reply! processlist doesn’t seem to be strange, although I do see a lot (10-15) /usr/local/nagios/bin/ndo2db-3x -c /usr/local/nagios/etc/ndo2db.cfg sometimes.
I disabled the ndomod and ndo2db module yesterday and this seems to make the setup stable. Is there some way that the ndo can cause the strange behavior and if so what can I change to make it stable ? We use the ndo2db extention to fill a database with details for our cacti/eventhandling so it’s not much of an option to disabled it for a long period.