Application monitoring and check_multi


#1

There are several hosts with partially the same processes running. I check these processes on each host with check_multi. Now, I want to have one service for each process showing the status of all the processes with the same name on different hosts. (E.g. the process Analyse.sh runs on host1, host4 and host7, the service Analyse.sh should now show the status if all are up or not).
Generally, this could be done with a service group, but because I use check_multi, I cannot reach the single subchecks (? or is that possible).

Solution attempts

  1. check the return text from all the check_multi exectuted

how can I get all the return text and implement services?

any other idea?


#2

update:

The approach I use now is the following:

  1. For every application (i.e. process) to monitor there is one passive service on each host. (hosts times processes)

  2. One checkmulti is executed (check_by_ssh) and does all the checking.

  3. The return value of the check_multi is parsed

  4. A passive service check result is sent for each service in the checkmulti file. For this, I created a script that uses the check_by_ssh command.

However, the parsing of the return value is a little nasty.
Does nagios resp. the check_multi have an option that the return value is already in a format that suits the passive service check result?

It would be gorgeous if I could just pipe the return value of the checkmulti directly to the command file (ie. var/rw/nagios.cmd)!


#3

[quote=“thierry”]However, the parsing of the return value is a little nasty.
Does nagios resp. the check_multi have an option that the return value is already in a format that suits the passive service check result?

It would be gorgeous if I could just pipe the return value of the checkmulti directly to the command file (ie. var/rw/nagios.cmd)![/quote]

Oh, that’s funny, I’m just working on the same idea at the moment :wink:

It’s still alpha, but you can see the work in progress here.

The basic idea is to [list]]use the XML output option -r 256 to get structured output/:m]
]add an event handler to the check_multi service/:m]
]feed the Nagios command interface with this eventhandler (PROCESS_FILE, thats faster that piping all commands)/:m][/list:u]

I did some tests for this and managed to feed 10.000 passive services on one Nagios instance.

Cheers,
-Matthias


#4

This sounds very interesting!

Here is the check_by_ssh_ex plugin, that I wrote.
The output of check_by_ssh with -r 13 is quite easy to parse.

[blockquote]#!/bin/sh

extended check_by_ssh command for Nagios

to provide data for passive checks

parses return string and sends selected data

to passive services for application monitoring

HOSTNAME=$1
NAGIOS_DIR=/opt/local/nagios
LIBEXEC_DIR=$NAGIOS_DIR/libexec
NAGIOSCMD=/opt/local/nagios/var/rw/nagios.cmd

MYRETURN=$LIBEXEC_DIR/check_by_ssh -H $HOSTNAME -l nexus -C "$LIBEXEC_DIR/check_multi -r 13 -f $LIBEXEC_DIR/checkmulti_all.cmd"
EXIT=$?

if “$MYRETURN” = “CRITICAL - Plugin timed out while executing system call” ] ; then
echo "$MYRETURN"
exit $EXIT
fi

WARNING="echo "$MYRETURN" |sed -n 1p | sed -n 's|.* warning (\(^)]*\)).*|:\1:|;s|, |:|g;p'
CRITICAL=”echo "$MYRETURN" |sed -n 1p | sed -n 's|.* critical (\(^)]*\)).*|:\1:|;s|, |:|g;p'
UNKNOWN=”echo "$MYRETURN" |sed -n 1p | sed -n 's|.* unknown (\(^)]*\)).*|:\1:|;s|, |:|g;p'"

WARNING="`echo “$MYRETURN” |nawk 'NR=1 {match($0,warning (

CRITICAL="`echo “$MYRETURN” |

UNKNOWN="`echo “$MYRETURN” |

PERFDATA="echo "$MYRETURN"| sed -n '/^|/p'"

mytime="perl -e \"print time;\""

echo “$MYRETURN” | sed -n ‘/^…]/s|||p’ |while read process line
do
STATUS=0;PERF=""
echo “$WARNING” | egrep “:$process:” >/dev/null && STATUS=1
echo “$CRITICAL” | egrep “:$process:” >/dev/null && STATUS=2
echo “$UNKNOWN” | egrep “:$process:” >/dev/null && STATUS=3

PERF="`echo "$MYRETURN"| sed -n '/^|.*'"$process"'/p'|sed 's|.* '"$process"'::\(^ ]*\) .*|\1|;s/.*::/|/'`"

echo "$mytime] PROCESS_SERVICE_CHECK_RESULT;$HOSTNAME;$process;$STATUS;$line $PERF" >>$NAGIOSCMD

done

echo "$MYRETURN"
exit $EXIT
[/blockquote]