Distributed Nagios Question


#1

Hey all,

We;ve been experiencing some strange issues with our distributed nagios setup.
We see the events come from the distributed (slave) server to the master server just fine. Each event is entered into the external commands file for nagios to pick up and nagios seems to be grabbing these events just fine. The issue seems to be with nagios updating itself. Watching the nagios.log shows events being processed but sometime they will hang for long periods of time and eventually the master server will “bomb” out or stop responding to nagios updates and we have to restart nagios. I’ve checked to make sure that we have “aggregate status updates” and the time interval for checking the external command file is set to “-1” The load on the machine is around 1.00 or 1.30.

Please let me know if you can make any suggestions which might be useful.

Thanks!!
Adam Ward


#2

Hey, i’ve got a guess…see if this helps:

check your service_perfdata_file_mode value in nagios.cfg. There was a bug in 2.6 and previous where service_perfdata_file_mode=w would append, and =a would write (swapped). All the data stacks up, try changing it around. You want it to write, not append, so in 2.9 use ‘w’ and in previous versions use ‘a’.

Lemme know if that helps…i might have this mixed up with a graphing issue…