Ndoutils causing nagios to hang


#1

Information about the system:
Red Hat Enterprise Linux 5.3 x86_64
Nagios 3.0.6
NDOutils 1.4b7

Came across a quite nasty bug where ndo stopped feeding data into the mysql database and nagios wasn’t executing checks. I searched google and found some people who appear to have the same problem and suggested it might be fixed in a later nagios version. I read the changelogs and found nothing about this so not sure if upgrading to 3.1.0 will really solve anything. Does anyone here have some kind of workaround or fix to this problem?

If it helps, here’s my ndomod.cfg:

instance_name=default
output_type=unixsocket
output=/var/nagios/rw/ndo.sock
tcp_port=5668
output_buffer_items=5000
buffer_file=/var/nagios/ndomod.tmp
file_rotation_interval=14400
file_rotation_timeout=60
reconnect_interval=15
reconnect_warning_interval=15
data_processing_options=-1
config_output_options=2

and my ndo2db.cfg:

ndo2db_user=nagios
ndo2db_group=nagios
socket_type=unix
socket_name=/var/nagios/rw/ndo.sock
tcp_port=5668
db_servertype=mysql
db_host=dbhost.example.org
db_port=3306
db_name=nagios
db_prefix=nagios_
db_user=someuser
db_pass=somepass
max_timedevents_age=1440
max_systemcommands_age=1440
max_servicechecks_age=1440
max_hostchecks_age=1440
max_eventhandlers_age=10080
debug_level=0
debug_verbosity=1
debug_file=/var/log/nagios/ndo2db.debug
max_debug_file_size=1000000


#2

I ran into this problem today when some miscoded plugins returned error code 255 instead of a standard Nagios 0,1,2,3,4 return code. Were your circumstances similar?

Nagios 3.1.0 w/ ndoutils 1.4b7 on RHEL5.3 x86_64. I ran strace on the hung nagios process and saw it writing to a unix socket at FD 4. But nothing else was reading that socket on the system (as per lsof output). So it looks like ndoutils stopped reading the socket and nagios hung.