"Service check did not exit properly"

Near as I can tell, nagios logs a failed service check with “Service check did not exit properly” when it can’t execute the service check plugin. Kinda a poorly worded log message…

I keep getting this failure at random, 'bout once a day using check_snmp_if and I can’t discern why.

I first thought it might be a timeout, but nagios logs those as timeouts.

I then thought it was a process limit, but I’m so under limits it hurts.

So, I got cute, I made a wrapper that logs when it starts, and any output from check_snmp_if or the exact error when it can’t execute it. Welp, it happened again, and I don’t see anything in my logs to indicate what happened, all I see are valid runs.

Anyone have further diag suggestions?

use check_snmp with the correct options rather than check_snmp_if. Reason is that it’s faster. Make sure to specify the correct mib with the -m option.

Ok, that still doesn’t explain the error condition. My concern is something else is going on, and I’m just going to bump against it later if I swap plugins now.

Not to mention check_snmp just reports one stat (in this case OperStatus) vs check_snmp_if which also looks at AdminStatus when deciding to alert or not.
Edited Fri Nov 04 2005, 10:56AM ]

More data…

1cc-r2 atm 4/1/0.1-aal5 paged at 17:45:29 reporting “Service check did
not exit properly”

Looking at the logging from our inhouse plugin wrapper, I have the
following data:

Tue Nov 8 17:40:20 2005:
COMMAND: /usr/local/libexec/nagios/check_snmp_if_2 -H
192.168.1.252 -C public -i 17 2>&1
OUTPUT: OK: Admn:up; Oper:up;
RETURN_STATUS: 0

Tue Nov 8 17:45:20 2005:
COMMAND: /usr/local/libexec/nagios/check_snmp_if_2 -H
192.168.1.252 -C public -i 17 2>&1
OUTPUT: OK: Admn:up; Oper:up;
RETURN_STATUS: 0

Tue Nov 8 17:45:32 2005:
COMMAND: /usr/local/libexec/nagios/check_snmp_if_2 -H
192.168.1.252 -C public -i 17 2>&1
OUTPUT: OK: Admn:up; Oper:up;
RETURN_STATUS: 0

Tue Nov 8 17:50:32 2005:
COMMAND: /usr/local/libexec/nagios/check_snmp_if_2 -H
192.168.1.252 -C public -i 17 2>&1
OUTPUT: OK: Admn:up; Oper:up;
RETURN_STATUS: 0

So, the check DID infact exit cleanly with appropriate data.

It looks like it didn’t get the exit data, then spawned a new check right after, which generated a recovery page.

nagios: 1.2
check_snmp_if: 0.3.5
FreeBSD: 4.11-p12

I’m now really stumped, at first I thought something was causing the script to execute, now it appears the script is executing, but something is breakng the communication between nagios and the plugin.

This has been resolved locally; see my post in meulie.net/forum_viewtopic.p … 472.0#4516