We have a very saturated network, and we will often have network timeouts on NRPE even though the service itself is working fine.
The default behavior for check_nrpe when it times out is to issue a CRITICAL error. I’d like to modify this to be a WARNING level, so we can more effectively filter false positives. I’d rather CRITICAL be a failure of the checked value, not of the NRPE call itself.
Any clues?
You can write a wrapper script that calls NRPE and when it detects certain behavior acts as you like, by default you can print and exit with the data provided by NRPE.
You would do something like this, not supported example that may break things badly for you.
mv <pathtonrpe> <pathtonrpe>-real
touch <pathtonrpe>
chmod 766 <pathtonrpe>
#!/bin/sh
data="$("${0}-real" "$@")"
exit="$?"
if "$data" = "My hated return" ]
then echo "WARNING: I don't think this is critical."; exit 2
fi
echo "$data"
exit "$exit"