Host displayed as down despite all services OK



I’ve tried searching for a solution to my problem with no success. Unfortunately, the search terms are relatively common, even when used together and so I’m having a hard time getting answers. Here is my problem:

I have a Nagios Core 3.3.1 install on a RHEL 6.2 server that can do standard check_host_alive tests all day long. In fact, I’ve got it monitoring 137 hosts that way so far. But I know that the real power of Nagios lies within NRPE. Herein lies the problem. I’ve only been experimenting with one remote host so far- another RHEL 6.2 system. I used a few online tutorials to install and configure NRPE on the remote host with xinetd. I created several custom commands in the nrpe.cfg flle as follows:

command[check_root]=/usr/local/nagios/libexec/check_disk -w 10% -c 5% -p /dev/mapper/system-root command[check_internal1]=/usr/local/nagios/libexec/check_disk -w 5% -c 10% -p /dev/mapper/system-internal1 command[check_tmp]=/usr/local/nagios/libexec/check_disk -w 10% -c 5% -p /dev/mapper/system-tmp command[check_usr]=/usr/local/nagios/libexec/check_disk -w 10% -c 5% -p /dev/mapper/system-usr command[check_system-local]=/usr/local/nagios/libexec/check_disk -w 10% -c 5% -p /dev/mapper/system-local command[check_var]=/usr/local/nagios/libexec/check_disk -w 10% -c 5% -p /dev/mapper/system-var

I created the check_nrpe command on the Nagios server with the following line:

define command { command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ }

I can run the following from the command line on the nagios server and get the appropriate output:

bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H plprdapp -c check_var DISK OK - free space: /var 7226 MB (94% inode=99%);| /var=427MB;7256;7659;0;8063

And I can run the following (as specified in the nrpe.cfg file from above) from the command line on the remote host and also get the appropriate output:

[[email protected] etc]# /usr/local/nagios/libexec/check_disk -w 10% -c 5% -p /dev/mapper/system-var DISK OK - free space: /var 7226 MB (94% inode=99%);| /var=427MB;7256;7659;0;8063

The problem is, the host is reported as being down in the nagios web interface despite all 7 services being OK:

The error reported on the host state information screen reads, “NRPE: Command ‘check_var,’ not defined”

How can this be? check_var works locally and remotely. Can someone point me in the right direction?


I don’t know about your check_var situation (maybe there is a comma in the way?).
The only time i had all services displayed as ok and host down was when i was monitoring hosts that were not responding to ping (ping to death) So they always showed as down. I just used another command to check if they were up. In my case i used check_nrpe without a command so it just replies it is fine.