Hey all,
We;ve been experiencing some strange issues with our distributed nagios setup.
We see the events come from the distributed (slave) server to the master server just fine. Each event is entered into the external commands file for nagios to pick up and nagios seems to be grabbing these events just fine. The issue seems to be with nagios updating itself. Watching the nagios.log shows events being processed but sometime they will hang for long periods of time and eventually the master server will “bomb” out or stop responding to nagios updates and we have to restart nagios. I’ve checked to make sure that we have “aggregate status updates” and the time interval for checking the external command file is set to “-1” The load on the machine is around 1.00 or 1.30.
Please let me know if you can make any suggestions which might be useful.
Thanks!!
Adam Ward