Near as I can tell, nagios logs a failed service check with “Service check did not exit properly” when it can’t execute the service check plugin. Kinda a poorly worded log message…
I keep getting this failure at random, 'bout once a day using check_snmp_if and I can’t discern why.
I first thought it might be a timeout, but nagios logs those as timeouts.
I then thought it was a process limit, but I’m so under limits it hurts.
So, I got cute, I made a wrapper that logs when it starts, and any output from check_snmp_if or the exact error when it can’t execute it. Welp, it happened again, and I don’t see anything in my logs to indicate what happened, all I see are valid runs.
use check_snmp with the correct options rather than check_snmp_if. Reason is that it’s faster. Make sure to specify the correct mib with the -m option.
Ok, that still doesn’t explain the error condition. My concern is something else is going on, and I’m just going to bump against it later if I swap plugins now.
Not to mention check_snmp just reports one stat (in this case OperStatus) vs check_snmp_if which also looks at AdminStatus when deciding to alert or not. Edited Fri Nov 04 2005, 10:56AM ]
I’m now really stumped, at first I thought something was causing the script to execute, now it appears the script is executing, but something is breakng the communication between nagios and the plugin.