I have a problem with monitoring disks on a couple of servers.
check_nrpe version 2.12
check_disk version 1.4.2
Linux version 2.6.5-7.244-bigsmp (geeko@buildhost) (gcc version 3.3.3 (SuSE Linu:evil:) #1 SMP Mon Dec 12 18:32:25 UTC 2005
This is how the servers disks are setup:
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 3.0G 611M 2.3G 22% /
tmpfs 2.0G 8.0K 2.0G 1% /dev/shm
/dev/sda6 9.4G 4.2G 4.8G 47% /opt
/dev/sda8 2.2G 960M 1.2G 46% /tmp
/dev/sda7 2.5G 703M 1.7G 30% /usr
/dev/sda5 49G 20G 27G 43% /var
194.68.232.154:/vol/vol1/data/prod
143G 110G 33G 78% /sfs
We have this line in nrpe.cfg:
command[check_diskspace]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -e -x /dev/shm
The ouput looks just fine when running the command exactly as obove on the local machine, but when we run it through check_nrpe it comes out like this:
DISK OK| /=610MB;2572;2723;0;3026 /opt=4248MB;8137;8616;0;9574 /tmp=959MB;1903;2015;0;2239 /usr=702MB;2145;2271;0;2524 /var=20083MB;41982;44451;0;49391 /sfs=112492MB;124032;131328;0;145920
that’s messed up!! other than just your current workaround of doing specific partitions, all i can suggest is making sure the version of NRPE and OpenSSL are the same on the check_nrpe side and the client side
Server: nrpe version 2.8.1 - openssl version 0.9.8a
Client: nrpe version 2.8.1 - openssl version 0.9.7d
Reference client: nrpeversion 2.8.1 - openssl version 0.9.7e
So there is a newer version of openssl on both the server and the client that I can’t recreate the problem on. Unfortunately we cannot update the client “just like that” since it is in a production invironment at one of our customers sites. But unless we find another solution I will try to make it happen somehow.
Also another Q, sometimes you get random crap at the end when an array isn’t ending properly. Did you compile your nrpe or check_disk? If so, did you compile the binaries for check_disk and nrpe on a different box than for your reference client? Could be some wonky little different library issue if everything wasn’t compiled on the same machine.
If your installs are package based rather than compiled, did you use a different package for the two problem machines?
If your reference machine is of the same distribution, have you tried copying its check_disk and/or nrpe binary to a problem box and trying that?
We have compiled our own package on the server I used as reference so it’s the same package on all our servers with this dist. We’re gonna try to compile it on one of the problematic servers instead and see if it makes a difference.