I am running nagios 1.2 on a well specced central server under RH EL3 which is collecting data from remote nagios servers using nsca/oscp.
A total of 3000+ checks are carried out by 5 remote servers
The nagios status.log on the central server is usually up to date but these results are not shown in the CGI
After a reboot (or each time nagios is restarted) the total number of “nagios -d” processes starts at 3 but gradually gets higher and higher until it reaches 1500+
Server doesn’t show signs of high load average but eventually it keels over and dies - needing rebooting
Ive tried adjusting possibly relevant parameters in nagios.cfg with no luck at all.
seems like some check is hanging and the childs don’t return…
try this…
take the PID of one of the nagios -d processes and grep the porcess list for it. does it return some defunct processes?
i’m having a porblem wiht hagios hanging with this situation so it could be useful
The number of nagios daemons now seems under control by means of an hourly script.
The status.log file contains reasonably up-to-date information but this is not being displayed on the web page which still has entries 2 or 3 days old. So it is still not working correctly.
The number of nagios daemons now seems under control by means of an hourly script.
The status.log file contains reasonably up-to-date information but this is not being displayed on the web page which still has entries 2 or 3 days old. So it is still not working correctly.
The number of nagios daemons now seems under control by means of an hourly script.
The status.log file contains reasonably up-to-date information but this is not being displayed on the web page which still has entries 2 or 3 days old. So it is still not working correctly.
I have taken 1800 of the checks off this server and built a second server to handle these. This solved all problems immediately.
I guess there is a limit coded somewhere but as I don’t know what it is I am guessing when I say it could be 2000 or 2500. Anyway each central server is now handling about 1800 checks and everything is fine.
Going to try to install multiple instances of nagios on the one server next…
Thanks again for the suggestions.
PS - does anyone know why one posting from me ends up displayed 3 times here?
I have taken 1800 of the checks off this server and built a second server to handle these. This solved all problems immediately.
I guess there is a limit coded somewhere but as I don’t know what it is I am guessing when I say it could be 2000 or 2500. Anyway each central server is now handling about 1800 checks and everything is fine.
Going to try to install multiple instances of nagios on the one server next…
Thanks again for the suggestions.
PS - does anyone know why one posting from me ends up displayed 3 times here?
I have taken 1800 of the checks off this server and built a second server to handle these. This solved all problems immediately.
I guess there is a limit coded somewhere but as I don’t know what it is I am guessing when I say it could be 2000 or 2500. Anyway each central server is now handling about 1800 checks and everything is fine.
Going to try to install multiple instances of nagios on the one server next…
Thanks again for the suggestions.
PS - does anyone know why one posting from me ends up displayed 3 times here?
So tell us what you had before and whay you have now. Just for clarity.
So you had +3000 service checks and how many where passive? Could you may have fixed it by using distributed servers? But it sounds like you already are using distributed server’s. Seems like you have achieved what I have been triing to do, put nagios to it’s knees and beg for mercy.
For the time being I have built another server with another instance of nagios on it and spread the checks among the 2.
About 95% of the checks done by these 2 are passive checks.
I’m presently redesigning it all so that one server runs multiple instances of nagios, splitting them that way instead but this will take a little design effort.
FYI I have over 4500 active checks running on a nagios instance which services our internal data centre so the limit I experienced does seem to be related to passive checks.
Still getting 3 posts on here for the price of one!
For the time being I have built another server with another instance of nagios on it and spread the checks among the 2.
About 95% of the checks done by these 2 are passive checks.
I’m presently redesigning it all so that one server runs multiple instances of nagios, splitting them that way instead but this will take a little design effort.
FYI I have over 4500 active checks running on a nagios instance which services our internal data centre so the limit I experienced does seem to be related to passive checks.
Still getting 3 posts on here for the price of one!
For the time being I have built another server with another instance of nagios on it and spread the checks among the 2.
About 95% of the checks done by these 2 are passive checks.
I’m presently redesigning it all so that one server runs multiple instances of nagios, splitting them that way instead but this will take a little design effort.
FYI I have over 4500 active checks running on a nagios instance which services our internal data centre so the limit I experienced does seem to be related to passive checks.
Still getting 3 posts on here for the price of one!