Missing Exit in Nagios Startup Script?

derekbrewer · August 19, 2011, 3:49pm

I had an issue a little while back where I restarted Nagios to pickup some config changes and Nagios did not stop propertly. I got the “Warning - nagios did not exit in a timely manner” message. The strange thing is that the startup script kept running and started another instance of Nagios. Seems to me that if Nagios does not stop within the predetermined 10 seconds, it should print the error message and exit the script otherwise it could run into problems like I did.

[code]stop)
echo -n "Stopping nagios: "

            pid_nagios
            killproc_nagios nagios

            # now we have to wait for nagios to exit and remove its
            # own NagiosRunFile, otherwise a following "start" could
            # happen, and then the exiting nagios will remove the
            # new NagiosRunFile, allowing multiple nagios daemons
            # to (sooner or later) run - John Sellens
            #echo -n 'Waiting for nagios to exit .'
            for i in 1 2 3 4 5 6 7 8 9 10 ; do
                if status_nagios > /dev/null; then
                    echo -n '.'
                    sleep 1
                else
                    break
                fi
            done
            if status_nagios > /dev/null; then
                echo ''
                echo 'Warning - nagios did not exit in a timely manner'
            else
                echo 'done.'
            fi

            rm -f $NagiosStatusFile $NagiosRunFile $NagiosLockDir/$NagiosLockFile $NagiosCommandFile
            ;;[/code]

It’s sort of funny that the comments in that snippet talk about waiting for Nagios to exit so that multiple daemons don’t get started, but as far as I can tell, it just blindly waits and keeps going after 10 seconds no matter what. Anybody else have thoughts on this? I think worst case here, it’s an extra level of safety checks for the daemon.