I am having issues with Nagios recently. I recently upgraded to Nagios 3.2 (from 3.0.5) on a computer running Ubuntu 8.04 to try and fix my problem but it did not make a difference. The problem that I have is that Nagios is not performing checks any more. When I go check a service the date for the ‘Next Scheduled Active Check’ is set to a date in the past. Looking in the log files I sometimes see that services are being orphaned. I have set up Nagios to receive SNMP traps as well and I can still receive those even though no service checks are being made. Trying to force a service check does not do anything either. I have 141 services total so I don’t think I am overloading the system.
Does anyone know how to get my Nagios up and running again?
Thanks.
that doesn’t seem to really help. I think I know what the issue is (just don’t know the solution). Nagios monitor the localhost and sometimes the number of processes reaches critical (over 700 processes). This is when I think services start to get orphaned. Once these processes get cleared Nagios starts to run normally again (but it doesn’t stay for long as these processes build up quickly). All that is running on this machine is Nagios and I am only monitoring about 140 processes total. I don’t think that is too many processes to be causing such bad performance.
Any ideas?
Thanks.
I never saw a large increase in the load. I check most services every 10 minutes or so. I think I figured out the problem. It is when I start receiving traps from 3 of my devices. They send traps frequently (I receive at least one trap per minute from one of the devices). Once Nagios starts receiving the traps it does not do any checks anymore (the event log does not even say the traps are orphaned - all the services have a next scheduled check time in the past). Why does Nagios just quit on the services?
Thanks.