Today is April 11, 2005. I’ve searched and read as many forum topics I could find related to the problem I’m experiencing but no solution yet. I’ve been through the Nagios documentation a number of times to make sure I didn’t miss something. I can’t find a resolution to my problem so I’m posting here.
The most basic of monitor, the check_host_alive, isn’t working.
I followed the manual exactly and complied Nagios 1.2 on RedHat 7.3. I have configured the CFG files, all using the default locations and other values because I didn’t make any changes during the Make process. I’ve confirmed all paths.
I can login to my Nagios installation and see my two hosts. I configured them to perform a check_host_ailve which equates to a ping command. Nagios reports the hosts are Down although I can run the command from the shell prompt and the hosts are alive and well (not down).
I enabled the check_nagios_process (for fun) and it tells me that Nagios isn’t running, although it does display the correct PID. I’m not getting any error messages in the Nagios event log or the shell console.
I haven’t tried any other monitoring commands, thinking that it doesn’t get easier than ping to troubleshoot. I can provide contents of CFG files, and so forth, but I’m not sure what info would be the most helpful to post.
I’m still relatively new with Linux but not a complete noob. Part of me is thinking the nagios user is lacking permission to run the ping command, but I’m not sure and the documentation says little more than to create a user object named “nagios”.
su - nagios
and try to ping… but i don’t think this is the case.
Is nagios complaining about any errors while starting?
try configuring an http check and see what happens. the check_host_alive is quite a strange beast even if it should work correctly if no other services are defined…
another thing which COULD be useful is disabiling the service check and two minutes later enabling it, sometimes it is not guaranteed services are executed the first time they are inseted in the config files even if they look enabled.
it shouldn’t it was to be sure the nagios user can in fact execute the check_ping command
as for the web interface, does nagios complain on start about some missing permissions on the rw directory? It could be reported as off if the lock file isn’t written… but there are so many possibilities that i’m going by trial
Paste the output you see, that is telling you that check_host_alive is failing. I think you have the common missunderstanding that nagios checks hosts. Well, it doesn’t perform the “host” check, unless the service check fails. i spent 2 weeks triing to figure that out.
If you left the cgi.cfg file the way it was, that might be why nagios reports it’s not runing.
nagios_check_command=/usr/local/nagios/libexec/check_nagios /usr/local/nagios/var/status.log 5 ‘/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg’
The above wouldn’t work for me, so I used the check_nagios.pl script that is included with the plugins.
[quote=“jakkedup”]Paste the output you see, that is telling you that check_host_alive is failing. I think you have the common missunderstanding that nagios checks hosts. Well, it doesn’t perform the “host” check, unless the service check fails. i spent 2 weeks triing to figure that out.
If you left the cgi.cfg file the way it was, that might be why nagios reports it’s not runing.
nagios_check_command=/usr/local/nagios/libexec/check_nagios /usr/local/nagios/var/status.log 5 ‘/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg’
The above wouldn’t work for me, so I used the check_nagios.pl script that is included with the plugins.[/quote]
Of late it seems as if nothing is running. All service checks get listed as PENDING, the “next update” times will change, but nothing ever changes from PENDING.
I tried your suggestion with cgi.cfg - no difference. I am getting a permission error when trying to run the Perl script - probably forgot to flag it Execute. I’ll try again. Otherwise, Nagios says it’s not running.
Anyone want to make a few dollars and get this working for me? Edited Thu May 05 2005, 05:20PM ]
I mentioned that Nagios quit spitting out status information as it used to. I did make a change (but didn’t mention) which was the installation of Fruity. So, I dug around a bit within Fruity to see what might surface. I found something.
Not completely understanding the multi-layered facets of host and service checking, I disregarded the fact when drilling down into a host that the Checks and Services/Checks were different entities. Seems I had enabled Checks but not enabled Services/Checks. I enabled them and, viola, started getting some action out of Nagios.
But (there’s always a but) I was getting a a problem where Nagios wasn’t locating my plugin for check_http. In the log I noticed the path was:
Obviously my check_http isn’t in the root of my file system so I headed on over to the $USER1$ variable which preceeds every (?) plugin. According to Fruity my $USER1$ was empty. I added the following:
/usr/local/nagios/libexec
Restarted Nagios (after exporting from Fruity) and, yay, I got my first successful status of OK for check_http.
Next steps…
Finish configuring the Service information for my hosts and try some other plugins.
Be rid of the “Nagios is not running” error which is likely related to running in Test, not Dameon, mode.
Thanks to anyone who posted feedback. Having someone post a reply gave me encouragement and hints that kept me going and considering other things to try.
Yea, it helps to simply double check all of your .cfg files and surely, having “checks_enabled 0” might have been a huge reason why you had trouble. Good job, and hope you have more fun in the future.