I had an issue a little while back where I restarted Nagios to pickup some config changes and Nagios did not stop propertly. I got the “Warning - nagios did not exit in a timely manner” message. The strange thing is that the startup script kept running and started another instance of Nagios. Seems to me that if Nagios does not stop within the predetermined 10 seconds, it should print the error message and exit the script otherwise it could run into problems like I did.
[code]stop)
echo -n "Stopping nagios: "
pid_nagios
killproc_nagios nagios
# now we have to wait for nagios to exit and remove its
# own NagiosRunFile, otherwise a following "start" could
# happen, and then the exiting nagios will remove the
# new NagiosRunFile, allowing multiple nagios daemons
# to (sooner or later) run - John Sellens
#echo -n 'Waiting for nagios to exit .'
for i in 1 2 3 4 5 6 7 8 9 10 ; do
if status_nagios > /dev/null; then
echo -n '.'
sleep 1
else
break
fi
done
if status_nagios > /dev/null; then
echo ''
echo 'Warning - nagios did not exit in a timely manner'
else
echo 'done.'
fi
rm -f $NagiosStatusFile $NagiosRunFile $NagiosLockDir/$NagiosLockFile $NagiosCommandFile
;;[/code]
It’s sort of funny that the comments in that snippet talk about waiting for Nagios to exit so that multiple daemons don’t get started, but as far as I can tell, it just blindly waits and keeps going after 10 seconds no matter what. Anybody else have thoughts on this? I think worst case here, it’s an extra level of safety checks for the daemon.