Event Handler Problem


#1

Hello everyone :slight_smile:

Im running nagios 1.2 and it’s going great apart from an event_handler I’ve setup.

I have apache running on the nagios server and I setup nagios to email me when apache isn’t running, that works no problems.
So in services.cfg I put in a event_handler for this machine as well, and called the event_handler restart-httpd.

Then in checkcommands.cfg I defined a new command named restart-httpd pointing to a script.

I sorted out sudo so that nagios can run this script and execute a httpd restart, so no probs there I think.

When this event handler picks up a CRITICAL SOFT 3 error it is supposed to run the restart-httpd script.

Watching the logs I can see the event coming in but it goes from 1 to 5 and never runs the restart-httpd script.

Below is some copy/paste from the .cfg’s and the script.
Id appreciate any help on this please.
Many thanks
Nick.

checkcommands.cfg

define command{ command_name restart-httpd command_line /usr/local/nagios/libexec/eventhandlers/restart-httpd $SERVICESTATE$ $STATETYPE$ $SERVICEATTEMPTS$ }

services.cfg

define service{ host_name netview service_description check_http check_command check_http max_check_attempts 5 normal_check_interval 5 retry_check_interval 3 check_period 24x7 event_handler restart-httpd notification_interval 30 notification_period 24x7 notification_options w,c,r contact_groups linux-admins }

/usr/local/nagios/libexec/eventhandlers/restart-httpd

[code]
case “$1” in
OK)
;;
WARNING)
;;
UNKNOWN)
;;
CRITICAL)
case “$2” in
SOFT) # What check attempt are we on
case “$3” in
3) echo -n "Restarting Apache - 3rd soft critical state"
sudo /etc/init.d/httpd restart
;;
esac
;;
HARD) # Restart Apache regardless
echo -n "Restarting Apache after Hard failure"
sudo /etc/init.d/httpd restart
;;
esac
;;

esac
exit 0

nagios.log

[1118319272] SERVICE EVENT HANDLER: netview;check_http;CRITICAL;SOFT;1;restart-httpd [1118319452] SERVICE ALERT: netview;check_http;CRITICAL;SOFT;2;Connection refused [1118319452] SERVICE EVENT HANDLER: netview;check_http;CRITICAL;SOFT;2;restart-httpd [1118319632] SERVICE ALERT: netview;check_http;CRITICAL;SOFT;3;Connection refused [1118319632] SERVICE EVENT HANDLER: netview;check_http;CRITICAL;SOFT;3;restart-httpd [1118319812] SERVICE ALERT: netview;check_http;CRITICAL;SOFT;4;Connection refused [1118319812] SERVICE EVENT HANDLER: netview;check_http;CRITICAL;SOFT;4;restart-httpd [1118319992] SERVICE ALERT: netview;check_http;CRITICAL;HARD;5;Connection refused [1118319992] SERVICE NOTIFICATION: nickl;netview;check_http;CRITICAL;notify-by-email;Connection refused [1118319993] SERVICE EVENT HANDLER: netview;check_http;CRITICAL;HARD;5;restart-httpd

#2

have you run the sudo manually and check for errors ? it seems in your logs it didnt enter the case statement check for argument 3. thats why it didnt restart httpd. what do you think? =)


#3

Hi thanks for the reply.

The user nagios can run the restart-httpd script with sudo fine.
Here’s a demo.

[nagios@netview eventhandlers]$ sudo /usr/local/nagios/libexec/eventhandlers/restart-httpd CRITICAL SOFT 3 Restarting Apache - 3rd soft critical stateStopping httpd: [FAILED] Starting httpd: OK ]

Issuing a restart obviously fails to stop Apache cos it’s not running, but then the start Apache works fine.

I left Apache down overnight and this morning I got plenty of email telling me that Apache was down, but nagios did not start it.

Im really confused, this is the only issue stopping us going ‘live’ with Nagios.

If you have anymore ideas I’d really appreciate hearing them :slight_smile:

Many thanks
Nick


#4

I found the problem - user error !
There was a " on the end of the command defined in checkcommands.cfg.
Man I feel dumb now :frowning:

Thanks for you help :slight_smile: