I need to monitor if a certain process is running at least once in a remote linux machine. I can do it either with check_by_ssh or NRPE.But for the alarming procedure if the process is not running and correct me if I am wrong…
NRPE: When the process is not running the NRPE togeter with the check_procs plugin will generate a change of state to CRITICAL in the service. and also, when nagios is not able to communicate with the remote maching.
check_by_ssh: will only generate a change of state in the remote service when plugin is not able to connect. If the process is not running it will show OK but with the information that the process is not running (check by ssh will execute “ps fax | grep process | grep -v grep | wc -l”) So it will just alert me when nagios is not able to communicate via SSH.
if “ps fax | grep process | grep -v grep | wc -l” happens to output a 0, it still exits with no error code (it’ll run successfully every time unless ps, grep, or wc binaries magically dissapear). So, the only way for check_by_ssh to return an error code and as a result have the service change state is if check_by_ssh times out, then you’ll get a CRITICAL with a timeout failure message.
You could also make a script that runs that command, then does an if statement (ie: PROCS = ps fax | grep process | grep -v grep | wc -l ; if $PROCS == 0 ; then echo “omg omg omg” ; exit 2 ; fi ) and that would make either of the methods return critical.
Thank you for the reply MP “omg omg omg” was very funny.
I am working currently on a script like the one you wrote, a little more advanced when you can define warning and critical states. And together with the connect_by_ssh can work for process or parse data from remote computer. Maybe I’ll put on nagiosexchange when I finish. I think is pretty useful and you don’t have to install nothing in remote machines.