Nsca spawning out of control

jakkedup · January 26, 2006, 4:12pm

After that, we will have to look at all of your services.cfg and nagios.cfg settings again.

jakkedup · January 26, 2006, 4:23pm

Thread “nagios stalling” reminded me.
Nagios will stop processing anything else, until it has completed the host checks that are spawned due to a service check failure.
So, make sure all of your host checks are even possible.
For example:
I have a passive check configured on my nagios server, and at times it fails. So, when it fails, the host check is performed, but the host check I have defined actually isn’t possible, since the device is not pingable from my nagios server. So, nagios spends way to much time doing this host check and the server begins to get a back log of nsca connections.

brian89gp · February 13, 2006, 9:19am

I have been doing some more experimenting, fresh Fedora 3 and CentOS 4.2. Both using default config files for everything and the slightly customized ones.

There seems to be one commonality, after perfparse is installed it doesn’t matter which OS or config file is used, Nagios starts locking up/dieing. Before it is installed everything works and the Nagios process never hangs up.

The big problem is that if the Nagios process is not responding for 10 minutes the box becomes sluggish to the point of being unusable, 20 minutes and it is a hard lockup.

I have make a quick and dirty fix that is working quite well, will probably leave it in place until future versions of perfparse and Nagios work better together. I have a cron job that runs every 10 minutes that tries a graceful shutdoin via “service nagios stop” then a “killall -9 nagios”, then Nagios is started. It has been running for 2 weeks with no ill effects so far.

brian89gp · February 13, 2006, 9:24am

The central server, the one locking up, does not perform any checks. It is 100% passive

I also recently tried Nagios 2.0 final with the same problems.

On a side note, are there any good alternatives to perfparse for logging/reporting? We are having countless problems with it, and unfortunately the Avaliability and Trending CGI’s built into Nagios don’t work as the nagios.log file grows to 2GB per week and has to be rolled.

jakkedup · February 13, 2006, 1:03pm

I use nagiostat and many others in this forum do to. I have a very large number of both active and passive checks and very many are being submitted to nagiostat for performance data. All on the same box.