Nagios failsto start up properly on boot

locust76 · September 14, 2009, 10:06am

Hi all,
I had a fully functioning Nagios installation before I went on vacation, but when I came back, things were odd. The server was horribly slow (pages took forever to load) and accessing NagVis didn’t work at all. I rebooted the machine thinking it would help, but Nagios didn’t start up properly on boot. It wouldn’t even get far enough to log any data in nagios.log. It would show up in the process list, but it wouldn’t do anything, and accessing the .php files from a web browser produced an error message saying that Nagios wasn’t running.
Long story short, I found out that if I execute Nagios manually ("./nagios -d …/etc/nagios.cfg"), everything works just fine. If I rely on Nagios to start automatically, it won’t.
Anyone have any ideas?

luca · September 15, 2009, 12:35pm

find out who touched the server during your vacation, cut his/her hands, change the root password, torture the culprit to find out what he did and try to fix it

Before anything you need to get to know what happened… as for nagios you may try to reinstall, the config files shouldn’t be touched, but if there’s anything else on the machine making it slow i don’t think that would help particularly.

locust76 · September 16, 2009, 11:26am

Hehe, unfortunately I’m the only one here with the passwords to the thing, plus I’m the only one who’s brave enough to fool around with Linux. I’ve made tremendous progress (started at the beginning of the year as a complete Linux n00b), but I’m still not even close to being an expert.

I had been fooling with NagVis a few months before my vacation: I believe the combination of MySQL, Nagios and NDO2DB causes quite a bit of disk activity and causes the virtual machine to slow down somewhat, but it wasn’t entirely intolerable. Maybe that had something to do with the extreme slowness when I got back, but I can’t be too sure.

I did use the opportunity to upgrade my Nagios version, but the install process didn’t fix the startup procedure.

How can I get into the system config and find out what exactly the OS is trying to do when it starts the Nagios service? Because, like I said, I can manually execute Nagios and NDO2DB and it works fine. Maybe I need to redo the commands / user settings for the service startup?

luca · September 16, 2009, 11:47am

try looking at the logs… maybe the nagios log after a reboot tells you nagios tries to start but maybe mysql hasn’t started yet, or something similar

locust76 · September 16, 2009, 3:17pm

I couldn’t find anything in the nagios logs, ndo2db logs or in messages, so I decided to recreate my problem my rebooting the server. Guess what? Nagios starts on boot now. Problem: ndo2db thinks nagios not runnig, and according to ps aux, nagios isn’t in fact running, yet I can pull up host info via web interface just fine! I looked in the web interface under process info, and it shows me a PID of 30207. I tried to kill 30207, but that process id doesn’t exist! I went into the GUI as root and looked at the system monitor and I couldn’t find the process there either! Weird! Now it’s “working,” but not in a way that makes any damn sense, nor is it effectively communicating with ndo2db!

locust76 · September 16, 2009, 4:02pm

So I rebooted again and killed the Nagios service via the web interface to make absolutely sure it was down. Then I performed an upgrade to try to overwrite any badness that might have been in there. I manually started nagios and ndo2db and now they seem to be cooperating, and nagios has PID=8673, which is visible via ps aux, so at least I have control over the damn thing again. I’m going to leave it alone for now and come back to it tomorrow maybe.

Any suggestions as to what might have happened? Apart from reading the logs, I didn’t do anything but restart the server, then suddenly I got the demon process from hell that’s invisible to all but itself and cannot be stopped!

luca · September 16, 2009, 4:45pm

no idea… better go look for a goat to sacrifice

locust76 · September 23, 2009, 11:54am

Well, as per usual, I managed to find the solution on my own

The /etc/init.d/nagios startup script was pointing to the wrong binary (/usr/sbin/nagios instead of /usr/local/nagios/bin/nagios. Other one was a different version, probably older junk left over from a past install) and the wrong nagios.cfg (/etc/nagios/nagios.cfg instead of /usr/local/nagios/etc/nagios.cfg).

I changed these values in the script and now Nagios starts with Ndo2db at boot time and everything seems to work again! No goats required!

luca · September 23, 2009, 8:36pm

do you usually install from source? if yes you changed something in the configure script…
Good to know you solved it

locust76 · September 24, 2009, 8:17am

I install directly from the source downloaded from nagios.org according to the directions supplied there for my Linux distro… Though it is very possible that I may have done something wrong when I first installed it. (I pretty much first started using Linux when I decided to install Nagios, so it’s very possible:) ) I suspect that i may have installed it originally ad the wrong user or something, but I never consciously changed any directories or install scripts. Admittedly I’ve been finding weird problems like this ever since