Nagios "froze"


#1

After fiddling with enabled performance data, my Nagios (2.0rc2 on Redhat FC4) froze up for a few hours. It answered queries, but no service checks were performed for four hours. Had to kill and restart it. Found these entries in nagios.log:

[1137616821] service_result_worker_thread(): poll(): EINTR (impossible)

I have in nagios.cfg:
process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
#host_perfdata_file=host-perfdata
#service_perfdata_file=service-perfdata

I have a fair number of services (about 700). Weird, this has never happened to me before, in 5+ years of using Nagios. It is possible I started Nagios from the command line and forgot to background it, in which case I’m a big dummy. I’ll report back if it happens again.


#2

Wow, I wiped out retention.dat and now everything is just sitting in Pending state. A few checks ran, saw them in ps output, but their status did not show up in the web page. There is one child of the main Nagios process and it can’t be killed with signal 15.

[later]

Well, I disabled process_performance_data and it’s back to normal. Very strange.


#3

I re-enabled process_performance_data and disabled host_and service_ perfdata_command and I’m using the “internal” performance data logging without running an external command. It seems to work with this configuration, so possibly the problem was due to the external command execution.

All I want to do is log the perfdata for a seleted few services. I use cron-driven scripts to stuff the data into RRD’s without relying on Nagios to do anything other than log the data.


#4

do you have the process-host-perfdata and process-service-perfdata definitions in your cfg files? (usually in misccommands.cfg if i remember right)

Luca