Nrpe checks work on remote and local, but web output wrong?


#1

Hello, I have a fresh install of Nagios 3.0. I have it setup and running on a dedicated nagios
server, lets call it nagios. I have several clients (Linux) which have nagios-plugins and nrpe installed.
I am checking processes, users, load, and disk on the remote clients. The problem I face is that
the checks work on the client, and work from the nagios server correctly, but in the actual
web interface, the numbers are wrong. For example, from the client:

level42 etc # /usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sdb1
DISK OK - free space: /space 1633075 MB (45% inode=95%);| /space=1929794MB;3002831;3378185;0;3753539

This is correct. From the server, if I connect to that client:

nagios servers # /usr/local/nagios/libexec/check_nrpe -H level42 -c check_disk
DISK OK - free space: /space 1633071 MB (45% inode=95%);| /space=1929799MB;3002831;3378185;0;3753539

That is correct. But what shows up in the web interface under level42 is:

Partition OK 03-24-2010 13:51:34 0d 0h 50m 56s 1/3 DISK OK - free space: / 29842 MB (82% inode=86%):

The disk does not have 29GB free, it has 1.6TB free. The same thing is happening for load, users, and processes.
For example, it works on the client, and the server, but when it comes out on the webpage load, users and processes
are all zero, all the time. I can go the client and login twice, and /usr/local/nagios/libexec/check_users -w 5 -c 10
work from the client and server, but again, on the webpage it is zero, even after updating.

Here is the server cfg file for the client above. The nrpe.cfg file on the client is stock except for the disk
partition.

define host{
use linux-server
host_name level42
alias Remote Host
address 10.1.85.36
contact_groups admins
}

define service{
use generic-service
host_name level42
service_description Partition
contact_groups admins
check_command check_nrpe!check_disk
}

define service{
use generic-service
host_name level42
service_description Users
contact_groups admins
check_command check_nrpe!check_users
}

define service{
use generic-service
host_name level42
service_description Load
contact_groups admins
check_command check_nrpe!check_load
}

define service{
use generic-service
host_name level42
service_description Zombie Processes
contact_groups admins
check_command check_nrpe!check_zombie_procs
}

define service{
use generic-service
host_name level42
service_description Total Processes
contact_groups admins
check_command check_nrpe!check_total_procs
}

I am not sure why it is not correct on the web interface. Any insight/suggestions would be appreciated.

Thanks.


#2

Perhaps a stupid suggestion: Have you checked if the values you get is the values from the nagios server?


#3

The server gets the correct values from check_nrpe when run from the command line, but when the scheduled run
goes off and displays it on the webpage, it is incorrect. Again, as an example:

From the client:

level42 ~ # /usr/local/nagios/libexec/check_users -w 5 -c 10
USERS OK - 2 users currently logged in |users=2;5;10;0

From the nagios server:

nagios etc # /usr/local/nagios/libexec/check_nrpe -H level42 -c check_users
USERS OK - 2 users currently logged in |users=2;5;10;0

From the nagios server webpage, after an update, still two people are logged in:

Users OK 03-25-2010 08:34:43 0d 20h 0m 37s 1/3 USERS OK - 0 users currently logged in

So, the check_nrpe seems to be working fine from the server, but when nagios runs it is
not reading it, or something. This may be a hint, but the check_total_procs comes back on
the client at 122, on the server at 122, and in the webpage it displays 119. If I run 40 processes
continuously, the client reports 162, the server reports 162, and the webpage stays at 119. I have
triple checked that I am checking the same server in my config files. I can not figure out why
the webpage is incorrect for all checks using check_nrpe. I have searched the web, google, etc…
I find nothing.

I am using nagios 3.2.1, nagios-plugins-1.4.14 and nrpe-2.12.