Montoring multiple linux partitions with nrpe

Hello all,

I am wanting to add more partitions on the command line for the disk space command check. Would I define my own user supplied argument in the nrpe config, or could I change the prehard coded nrpe command to check more than one disk?

Thanks for the help ahead of time.

I have the following command by nagios for checking the disks on my remote server:

command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p / -p /dev -C -w 100000 -c 50000 -p /usr -p /var -p /home -p /srv -p /tmp

The problem is, no matter what I change in the NRPE config file, Nagios always shows the “/” (root partition) on the web interface and totally regards the rest of the command in the command line. I have recompiled nrpe and that has not helped anything. What is wrong with Nagios? Is there anything wrong with the command I have for NRPE?

The check output is perhaps appearing in multiple lines? (examine the check output by running the command locally)
Nagios only processes the first line as the $SERVICEOUTPUT$ macro, the rest is possibly being returned as $SERVICEPERFDATA$ and $LONGSERVICEOUTPUT$ (see Nagios Plugin API documentation for more information). Not particularly useful if you want to actively monitor multiple partitions, I’d be inclined to create multiple entries in the NRPE config file, i.e.

command[check_hda1_root]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p / -C -w 100000 -c 50000 command[check_hda1_var]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /var -C -w 100000 -c 50000 command[check_hda1_root]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /usr -C -w 100000 -c 50000 ...

HTH

/S

I have made a new test command for “/dev” and nagios returns a “command not defined” error. I have enabled “dont_blame_nrpe=1” in the nrpe config. Does nrpe have to be compiled with an extra command for enabling support for command arguments when its compiled for installation? I have checked both the service/command check and both has the same corrosponding command name (check_dev). So I think it has something to do with nrpe not being compiled with support for command arguments. Otherwise I am still unsure of where the error could be caused…

Hi

Are you running the NRPE daemon as a standalone daemon? If so, you will need to restart it, this is the most likely reason for the ‘command not defined’ error.

‘dont_blame_nrpe=1’ will enable commandline arguments. AFAIK there should be nothing else to do apart from create a suitable command, i.e.

command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$

Again, if running as standalone daemon, enabling this option requires it to be restarted

HTH

/S

That makes sense… I only have the command on the local nagios server. Do I have to have this command in the nrpe config also on the remote Server for the service check?

Yeah so if you have a service check configured in nagios to use NRPE to run check_whatever, you’d have something like the following in the nagios server

define service{
use generic-service
host_name remotehost
service_description My Remote Check
check_command check_nrpe!check_whatever
}

and in your nrpe config on the remote server, a command entry for check_whatever, telling NRPE where the script to run is located on that remote server, like

command[check_whatever]=/usr/local/nagios/libexec/mycheckscript.pl

A (slighty) more detailed explanation on checking custom remote services is given in the NRPE documentation

I got it… I just had to place the same command check in the nrpe config on the remote server as on the local nrpe config and it worked. I am wanting to monitor the file system instead of the mounts in linux. The current error that nagios is returning from the this command check is the following:

I do not want to run checks on the mounts, but on the partitions where the mounts are… how can I configure nagios/nrpe for this?

P.S. I figured out a command that works for ckecking multiple HDD’s in one command:

[blockquote]I do not want to run checks on the mounts, but on the partitions where the mounts are… how can I configure nagios/nrpe for this?[/blockquote]
?? ** -p** should do it. ** -M** is mountpoints. If your experience of this differs then perhaps the plugin isn’t working as ‘advertised’ in the help…

 -M, --mountpoint
    Display the mountpoint instead of the partition
 -p, --path=PATH, --partition=PARTITION
    Path or partition (may be repeated)

much oddness.

Ok cool, every thing is working… thanks so much for your help!!! Although I have noticed something with the service check for the disk. The following pic is of from the command line on the remote host. I ran “df” from the command line to look at the space left on the partition, and it is different than what the nagios service check returns:

img376.imageshack.us/my.php?imag … hotwh5.png

And here is the pic from the nagios webinterface:

img100.imageshack.us/my.php?imag … picgv5.png

What do these return values mean? They are different compared to what I am seeing from the commandline. I have the following command line by nrpe:

command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/md1 -p /dev/md5 -C -w 20% -c 10% -p /dev/md6 -p /dev/md7 -p /dev/md8

The percentages are different from what is shown on the remote server from the command prompt. Why? Also when I use the following command, nagios throws an error:

What is the difference with the second command that has “-w 100000 -c 50000” compared to the percentage values? Sorry about all of the questions, but I am kinda curious how this works… I have to be able to explain this to my colleges if they have questions over it. :slight_smile:

Images look OK to me - nagios is reporting free space, df is reporting used space. If you add them together for each partition it comes to 100% every time.

Doesn’t appear to be anything obviously wrong with the use of -w 100000 -c 50000 (this equates to warn at <100MB and crit at <50M:roll:. What is the error? What do you get running the check from the command line?

Here is the comand from the nrpe.config:

command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/md1 -p /dev/md5 -C -w 100000 -c 50000 -p /dev/md6 -p /dev/md7 -p /dev/md8

And the pic from the nagios interface from the command:
img81.imageshack.us/my.php?image … nntci8.png

I am confused from where nagios is getting this error. And the thing with the procentages from the nagios interface, I though that they are suppose to reflect how much space is remaining on the partitions… I looked at “/home” and nagios says that it is at “99%”, so I guess that is saying that 99% free, but with the command above it gives and error. So is it 99% full or empty? :shock:

Looks like the example in --help is wrong…
[blockquote]Examples:
check_disk -w 10% -c 5% -p /tmp -p /var -C -w 100000 -c 50000 -p /
Checks /tmp and /var at 10% and 5%, and / at 100MB and 50MB[/blockquote]
Seems the integer units are not ~KB as implied above, as clearly -w 100000 is not the same as -w 100MB as demonstrated below…

[root@localhost libexec]# ./check_disk -w 100000 -c 50000 -p /
DISK CRITICAL - free space: / 11241 MB (86% inode=97%);| /=1787MB;-86263;-36263;0;13737
[root@localhost libexec]# ./check_disk -w 100MB -c 500MB -p /
DISK OK - free space: / 11241 MB (86% inode=97%);| /=1787MB;13637;13237;0;13737

More than likely, the units are infact MB themselves…

[root@localhost libexec]# ./check_disk -w 10000 -c 5000 -p / DISK OK - free space: / 11241 MB (86% inode=97%);| /=1787MB;3737;8737;0;13737 [root@localhost libexec]# ./check_disk -w 11241 -c 5000 -p / DISK WARNING - free space: / 11241 MB (86% inode=97%);| /=1787MB;2496;8737;0;13737 [root@localhost libexec]# ./check_disk -w 11240 -c 5000 -p / DISK OK - free space: / 11241 MB (86% inode=97%);| /=1787MB;2497;8737;0;13737 [root@localhost libexec]#
Therefore I reckon that by using -w 100000 -c 50000 you are actually in fact setting warn and crit at <100GB free and <50GB free respectively
so it’s MD6 (/var) that’s causing the critical alarm although MD7 (/home) will also be in a warning condition

How annoying and confusing is that out of 10… ?

[blockquote]…looked at “/home” and nagios says that it is at “99%”, so I guess that is saying that 99% free, but with the command above it gives and error. So is it 99% full or empty? [/blockquote]
99% empty :slight_smile:

Ah ok, that makes sense. I added the percentages from df and the results from nagios and they all equal 100%. I understand how it works now. Thanks a lot again man, I really appreciate the help. !lol

No worries

All the best

/S