Help Writing Custom Checks


#1

Right now I am setting up a test system for proof-of-concept to implement Nagios to monitor a large number of servers and VPN concentrators, however I am having some problems finding documentation and API’s for setting up custom checks through NSC++ and/or NRPE.
My understanding (please correct me if I am wrong) is that NRPE is utilized by NSC++ for monitoring a windows host, or can be implemented as a stand-alone solution for the monitoring of a *nix host.

I currently have the Nagios server up and running, monitoring basic functions on two W2k3 servers- but before I present this to my team I would like to be able to showcase the abilities of Nagios to be customized.
Specifically:
1- Monitoring the heap size of an Apache Tomcat JVM (mostly looking for an alert if it breaks a specified limit- I have a batch file that watches the log now, but it only reacts when the service has already crashed, a little warning and the ability to proactively restart the service would be a HUGE improvement, and save me a lot of helpdesk tickets)
2- Monitor remote host connectivity to a satellite system (eg monitoring a remote server’s latency & connectivity to a database server)
3- How/where to define event handlers on the remote system (is it the same process as a creating a check, simply pointing it to the handler script?)

On a slightly less related note:
I also am looking to monitor several other OS’s; namely AIX, and VMS; is NPRE all that I need to monitor these hosts?

If you would happen to know of any resources, or have a snippet of code to get me started on the right track, it would be greatly appreciated.


#2

Hi,

so, nrpe is just a method of communication between hosts. so, on your nagios box, the nagios process runs check_nrpe -H x.x.x.x -c check_c_partition
then on your host, NRPE is listening (whether in daemon/standalone mode, or on linux through xinetd). It communicates securely in response, says “oh you want me to run check_c_partition…well according to my configs, i’m supposed to run this script”, so it runs whatever script you have defined and returns the output and return code to your nagios box.

So, all nrpe does is let you run scripts on a remote machine.

So, all you have to do to make new checks is write your own script that exits with a 0 (OK), or 1 (WARNING), or a 2 (CRITICAL) along with a message, then point to that script in your nrpe.cfg.

I’m not sure if NSC++ lets you define your own commands, but if it does, you could point to your batch scripts/VBscripts/python. If i were you, i’d learn some python so that you can better monitor your windows boxes.

If NRPE will build on it, then you can use it. Otherwise you’ll have to use other methods, like check_by_ssh which will log into a box via ssh and run a command and return the output.


#3

Thank you very much for your help!!!
I hadn’t thought of using the check_by_ssh, I will give that a shot tonight; however, in toying with the trend reporting today I noticed that nagios logs a data point (ie cpu % rather than “Acceptable”) only when that status changes states. To “fill in” the rest of these data points, would I need to tie in to a mySQL server and parse that to generate “actual” performance graphs? It would be nice to trend data even when it doesn’t break the limit for generating an event.


#4

nagios records every check result, OK or critical. In the case you’re describing, it sounds like your cpu check is only returning “Acceptable” when it’s OK. see if you can modify your cpu check script to return the detected cpu usage, ie “OK - cpu usage acceptable, 12%”.

Lots of people use applications like PnP or nagiosgraph+cacti to do graphing. lets you make prettier graphs out of anything information you want