Configuring check_by_ssh


#1

I recently set up nagios and I’m trying to get it more functional. I’d like to use the check_by_ssh module to get information on remote hosts such as disk usage(df -h), and cpu utilization(w). Does anyone have a working example they could share. Sorry for the newbie post, but could find much on this anywhere.


#2

check_by_ssh is a plugin, type ./check_by_ssh --help to get the syntax.
./check_by_ssh -H testserver -C ‘df -k’

I have the nagios user able to ssh to the testserver without authenticating since I’ve created the authorized keys, etc already. Run ssh-keygen to create your public/private key pair.


#3

I did
./check_by_ssh -H client.machine -i /path/to/id_dsa -l nagios -C 'df -H’
and it returns


Is that information nagios can understand? How can I set this up to send a notification is a filesystems is at 90% or above?


#4

No, nagios won’t know what to do with that output. I’m running it by hand right now, and I notice that I only get one line of output from the df -k command. when actually there could be several lines. I’d check to make sure your host understand’s df -H, since many ‘nix’ don’t.
Looks to me, like we need to find a way to get the full output returned, which may mean tweaking the script ourselves.


#5

I know the client machine does understand df -h(typo above). How is check_by_ssh usually implemented? I feel like I must be missing something.


#6

I’ve never used it, since I do all of my checks with snmp or by having nagios on the machine sending me data suing nrpe agent. If the machine is running snmp, then you could get the ‘df -h’ that way. It looks to me, like check_by_ssh could use more work, since I see no way to make it give me more that one line of data. It runs the command ‘whoami’ just fine.


#7

I’m messin around with it, by piping the command to awk. kinda like this…
./check_by_ssh -H testserver -C 'df -k|awk ‘{print $1}’'
When I get the correct syntax, I’ll post back, if you haven’t done it already.


#8

Looking at it closer, you may notice there is no built in way to get any “critical”, “warning” or “ok” status back by setting limits. So even if we got the output we wanted, such as, “2% free” we could not be able to get a warning in nagios. All you would get is the output displayed. It would take a great deal of re-writing this script to get it to work as it should work. I’m not even sure what it is good for.

It might take a bit of work, but I’d just setup barebones nagios on each of the machines that you are interested in. That’s what I did. It’s easier for your “main” nagios machine to process 1000 passive checks, than 1000 active checks anyway.

I have nagios setup on several Solaris machines to check diskspace, cpu, Oracle tablespace free, 50 or more “is a process running” etc. It’s well worth the effort and the guys at work just love what nagios is checking.