A newbie with a Muti Process Compaq Server


#1

Can anyone help a Newbie

I have a Compaq Server with 4 Xeon cpu’s and I am tring to monitor the CPU load

‘check_local_procs’ command definition

define command{
command_name check_procs
command_line $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$

In the Nagios.log file I am getting this message.
“SERVICE ALERT: Thorium;Avg CPU Load 15min;UNKNOWN;SOFT;2;Critical Process Count must be an integer!”

Does this make any sense?

Also on the same server I am struggling to get the email working and to monitor the partitions. Could I get some pointers.

Thank You in advance…

Rap
Edited ]


#2

It looks like you didn’t use -s integer, so paste your command.

“Process Count must be an integer” so use an integer value.

Do this:
which mail
/bin/mail

If mail is installed, it should work.

./check_disk --help


#3

check_local_disk!20%!15%!/home


#4

In the Nagios.log file I am getting this message.
“SERVICE ALERT: Thorium;Avg CPU Load 15min;UNKNOWN;SOFT;2;Critical Process Count must be an integer!”

So why did you paste the check disk command? Your problem was with the check_procs plugin wasn’t it?

Your check disk looks fine to me, what’s the problem? You paste the command definition for check_procs and the error, but not the command. You pasted the command for the check disk, but not the definition and the error. What gives? Are we working on this in some sort of new method?


#5

Sorry

‘check_local_procs’ command definition

define command{
command_name check_procs
command_line $USER1$/check_procs -w $ARG1$ -c $ARG2$ -m metric=CPU

    service_description             Avg CPU Load 15min
    is_volatile                     0
    check_period                    24x7
    max_check_attempts              3   
    normal_check_interval           5   
    retry_check_interval            1   
    contact_groups                  firm_wide
    check_command                   check_procs!15!80

This is what I am trying now

Edited Sat Jul 30 2005, 07:34AM ]


#6

" service_description Avg CPU Load 15min "
I don’t know why you are calling it cpu load avg, and then use the check_procs plugin. That plugin does not reflect 15 min avg load.


#7

I’m assuming here that you are just trying to monitor the local CPU load. check_proc’s main function is to check processes not processors. The CPU metric is used to see whether one of the processes you are monitoring is using more than a specified amount of the CPU. From check_procs --help
check_procs -w 10 -c 20 --metric=CPU
Alert if cpu of any processes over 10% or 20%

To get the local CPU usage, you could try check_load. A good place to look for an overview of the commands is here
(basically it’s a list big list of check_command --help)


#8

Oh and the one mail issue that I had was, as I was routing the mail through the local exchange server, was remembering to configre the .mailrc file


#9

I’ll post this just in case anyone is searching the archives for multi CPU stuff. If it was a remote multi CPU NT/2000 box that you were wanting to check checking then you can use check_nt as below

check_nt -H -p 1248 -v COUNTER -l
"\Processor(0)% Processor Time",“CPU 1 Load” -w 90 -c 95 -t 30

check_nt -H -p 1248 -v COUNTER -l
"\Processor(1)% Processor Time",“CPU 2 Load” -w 90 -c 95 -t 30

…etc etc

As the normal CPU check using check_nt/nsclient averages out the load across the cpus (i.e. on a dual cpu machince one cpu could be 100% and the other 0% and you’d get a 50% reading and no warning)
Edited Mon Aug 01 2005, 12:33AM ]