Stumped on a Nagios/NRPE problem


#1

I’m going to give as much info as possible for this problem. Basically I’m getting the nrpe command not defined error. The problem is that it does work just not in nagios.

I’ve tried the commandline from the monitoring box:

[root@nms001 libexec]# ./check_nrpe -n -H muriel.lax06.dti -c check_db
DISK OK - free space: /db 116883 MB (43% inode=99%);| /db=150415MB;264707;273155;0;281604

I’ve double checked the nrpe.cfg file on the client box and restarted the nrpe process

command[check_db]=/usr/nagios/check_disk -w 6% -c 3% /db

[root@muriel nagios]# /usr/nagios/check_disk -w 6% -c 3% /db
DISK OK - free space: /db 116875 MB (43% inode=99%);| /db=150424MB;264707;273155;0;281604

[root@muriel nagios]# service nrpe restart
Shutting down nrpe: OK ]
Starting nrpe: OK ]
[root@muriel nagios]#

I also tried clearing all my nagios log files and restarted nagios just in case it somehow caused this problem. I still get the same results. Even though the checks are clearly defined whenever nagios trys it it says check undefined. The only checks that are working are the Load Avg check and a SSHD procses check. Also, I killed NRPE on the client box and nagios was still reporting the “command undefined” checks instead of a “NRPE not responding” message.

Also, these service checks are generic and are reused on at least 200 other servers and it works fine. This server is the ONLY one out of a bunch that is having this weird problem. I wonder if nagios is somehow keeping the service check information for the failing services, like I said I tried deleting all the log files just in case. If anyone has a clue I’d much appreciate it

Thanks,
-hd

muriel.lax06.dti

/db check

CRITICAL 	07-24-2007 11:03:52 	0d 0h 20m 34s 	3/3 	NRPE: Command 'check_db' not defined 

/db2 check

CRITICAL 	07-24-2007 11:03:52 	0d 0h 20m 34s 	3/3 	NRPE: Command 'check_db2' not defined 

DISK-HOME

CRITICAL 	07-24-2007 11:03:52 	0d 0h 20m 34s 	2/2 	NRPE: Command 'check_home' not defined 

DISK-ROOT

CRITICAL 	07-24-2007 11:03:53 	0d 0h 20m 34s 	2/2 	NRPE: Command 'check_root' not defined 

DISK-SPARE

CRITICAL 	07-24-2007 11:03:53 	0d 0h 20m 34s 	2/2 	NRPE: Command 'check_spare' not defined 

DISK-USR

CRITICAL 	07-24-2007 11:03:53 	0d 0h 20m 34s 	2/2 	NRPE: Command 'check_usr' not defined 

DISK-VAR

CRITICAL 	07-24-2007 11:03:53 	0d 0h 20m 34s 	2/2 	NRPE: Command 'check_var' not defined 

LOADAVE

OK 	07-24-2007 11:03:53 	0d 0h 20m 34s 	1/3 	OK - load average: 0.00, 0.00, 0.00 

PROCS-CRON

CRITICAL 	07-24-2007 11:03:53 	0d 0h 20m 34s 	3/3 	NRPE: Command 'check_cron' not defined 

PROCS-SSHD

OK 	07-24-2007 11:03:53 	0d 0h 20m 34s 	1/3 	PROCS OK: 1 process with command name 'sshd' 

PROCS-SYSLOG-NG

CRITICAL 	07-24-2007 11:03:53 	0d 0h 20m 34s 	3/3 	NRPE: Command 'check_syslog-ng' not defined

#2

Alright I got it working.

Thanks milandred and his post for giving me the idea:
meulie.net/portal_plugins/fo … c.php?8693

I still don’t understand why it worked but when I switched the monitored host from IP address to a Hostname all the checks worked fine.

-hd