[1224282426] SERVICE NOTIFICATION: test;google.com;check_url;CRITICAL;notify-service-by-email;(Return code of 255 is out of bounds)
Any idea
here are my entries.
hosts.cfg
define host{
host_name google.com ; The name of this host template
alias google.com
use generic-host ; This template inherits other values from the generic-host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 1 ; Actively check the host every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 10 ; Check each Linux host 10 times (ma:evil:
contact_groups admins ; Notifications get sent to the admins by default
}
define service{
use local-service ; Name of service template to use
host_name google.com
service_description check_url
check_command check_url!
check_period 24x7
check_interval 1
retry_interval 1
notification_options w,u,c,r
notification_interval 10
contact_groups test
notification_period 24x7
notifications_enabled 1
}
You’re missing the argument $ARG1$ at the service check_command:
check_command check_url!
After the exclamation mark, the plugin expects the argument.
Have you tried to run the plugin from the terminal?
define command{
command_name check_url
command_line $USER1$/check_url.pl $ARG1$
}
and services.cfg
define service{
use local-service ; Name of service template to use
host_name aetv-aneweb01.aeboxes.com
service_description check_url
check_command check_url!$ARG1$!
check_period 24x7
check_interval 1
retry_interval 1
notification_options w,u,c,r
notification_interval 10
contact_groups test
notification_period 24x7
notifications_enabled 1
}
define service{
use local-service ; Name of service template to use
host_name google.com
service_description check_url
check_command check_url!google.com
check_period 24x7
check_interval 1
retry_interval 1
notification_options w,u,c,r
notification_interval 10
contact_groups test
notification_period 24x7
notifications_enabled 1
}[/code]
That
-v switch in here, I don’t know what it represents but it expects an argument. I have avoided that in the upper example just to make sure it works. If the upper example works, then leave it that way.
And that $ARG1$ should be some value in service definition (like here):
not litteraly $ARG1$, although I don’t really know what value, 'cause, like I’ve said earlier, I don’t know what the -v switch represents, and this number 20 is just a dummy example.
To clarify, ! is the argument delimiter Nagios uses to pass the arguments from the service object definition to the check_command object. Therefore, given such a command object definition as
define command{
command_name check_foo
command_line $USER1$/check_whatever.pl -a $ARG1$ -b $ARG2$ -c $ARG3$
}
and the following variable in the service check object
check_command check_foo!larry!curly!mo
nagios will map the variable given after the ! delimiters to the $ARGx$ arguments in order, so
$ARG1$=larry
$ARG2$=curly
$ARG3$=mo
and thus effectively run the command as if it was seen at the CLI as
check_whatever.pl -a larry -b curly -c mo
So, if you’re using
command_line $USER1$/check_url.pl $ARG1$
and
check_command check_url!google.com
nagios will therefore try to run
check_url.pl google.com
which is what you tested as working from the cli…
[blockquote]bash-3.00# …/…/libexec/check_url.pl google.com
OK: 200 OK
bash-3.00#[/blockquote]
Well, hopefully that will go some way to explaining what is occuring there. Now, secondly, it’s important to note that Nagios will run the command as the nagios user & group (nominally nagios and nagios respectively), so your plugin must have appropriate permissions set to allow this, so ensure the permissions on check_url.pl are correct i.e. the same as the rest of your check plugins, like
[blockquote][root@localhost libexec]# ll check_tcp -rwxr-xr-x 1 nagios nagios105299 Sep 22 12:09 check_tcp <—needs to be this
[root@localhost libexec]# ll check_url.pl
**-rw-r–r-- 1 root root **0 Oct 24 10:31 check_url.pl <— not like this
[root@localhost libexec]# chown nagios:nagios check_url.pl
[root@localhost libexec]# chmod 755 check_url.pl
[root@localhost libexec]# ll check_url.pl
**-rwxr-xr-x 1 nagios nagios **0 Oct 24 10:31 check_url.pl ← Hurrah!
[root@localhost libexec]# [/blockquote]
Hi Strides and Albin. Thank you very much for updates.
I have updated the files and still errors are being send.
Warning: Return code of 255 for check of service ‘check_url’ on host ‘google.com’ was out of bounds.
SERVICE NOTIFICATION: admins;google.com;check_url;CRITICAL;notify-service-by-email;(Return code of 255 is out of bounds)
define service{
use local-service ; Name of service template to use
host_name google.com
service_description check_url
check_command check_url!google.com
check_period 24x7
check_interval 1
retry_interval 1
notification_options w,u,c,r
notification_interval 10
contact_groups admins
notification_period 24x7
notifications_enabled 1
}
define host{
host_name google.com ; The name of this host template
alias google.com
use generic-host ; This template inherits other values from the generic-host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 1 ; Actively check the host every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 10 ; Check each Linux host 10 times (ma:evil:
check_command check_ssh ; Default command to check Linux hosts
notification_interval 120 ; Resend notifications every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
}
then, it must be something failing inside the script when nagios runs it, but I can’t see what…
When the script runs in nagios, do the temp files (tmp_html,tmp_res,tmp_res1) get created OK ? (You will have to hash out the line that rm’s these files at the end of the script as it will delete them otherwise… well, assuming it gets that far…)
Infact looking at it I get the feeling that it is exiting in an ‘UNKNOWN’ state for some reason. Unfortunately the script writer decided to make this exit with an exitcode of -1 which nobody likes, so you end up with the 255 returncode out of bounds error… If you make the script use the proper ‘UNKNOWN’ exit code of 3 maybe at least that side of things will be correct.
[blockquote]
my %ERRORS = (‘UNKNOWN’ , ‘3’,
‘OK’ , ‘0’,
‘WARNING’, ‘1’,
‘CRITICAL’, ‘2’);
[/blockquote]
why it is exiting “unknown” is a mystery… maybe finding out if those 3 files are created will shed some light on it…
I was banging my head against the wall for a few hours on this one today during a 2.x to 3.x nagios migration. Turns out the problem is that wget cannot write to the location it is trying to store the temp files when executed by nagios. That’s why it was tricky to debug, when running the command by hand, everything worked fine because it was putting them in your working directory.
There is probably a more appropriate fix for this, but for now I just added:
chdir('/tmp');
Before any of the wget files are pulled down. I know this is an old post but hopefully this helps someone!