Checking http problem!

geps · January 2, 2007, 3:24pm

Hi!
I’m new here, and I have a very annoying problem with the check of an http service (in the example is mysite.org, obviously it is really another web server). I have the configuration I attached below in my Nagios 1.3 installation. The host is not pingable. The problem is that often the host goes down, but the service is perfectly up.
I have the following line in the service page:

and this is in the host page:

As you can see I’ve notification of a critical problem, while the service is normally up. This occur about one or two time every day.
Why I have this behavior, and how I can avoid it?
Thank you in advance for pay attention, bye!
GePs

PS: here the config:

[code]services.cfg:

define service{
use generic-service
host_name mysite.org
service_description HTTP
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 1
contact_groups nt-admins
notification_interval 240
notification_period 24x7
notification_options w,u,c,r
check_command check_http
name t-http
}

hosts.cfg:

define host{
use generic-host
host_name mysite.org
alias Web Portal
address mysite.org
check_command check-host-alive
max_check_attempts 10
notification_interval 60
notification_period 24x7
notification_options d,u,r
name t-web
}
[/code]

jakkedup · January 4, 2007, 5:28pm

Sounds like you have host checks being scheduled, which you should not do. So to disable regular checks of a host, set the check_interval directive in the host definition to 0
nagios.sourceforge.net/docs/2_0/tuning.html

geps · January 6, 2007, 6:37pm

Here what I get adding check_interval 0 to host definition:

[code]/etc/nagios# /etc/init.d/nagios restart
Stopping nagios: nagios.
Starting nagios:
Nagios 1.3
Copyright (c) 1999-2004 Ethan Galstad ([email protected])
Last Modified: 10-24-2004
License: GPL

Reading configuration data…

Error: Invalid host object directive ‘check_interval’.

Error: Could not add object property in file ‘/etc/nagios/hosts.cfg’ on line 271.[/code]

jakkedup · January 6, 2007, 6:52pm

Sorry, I just noticed you are using v 1.x nagios. You didn’t catch that, since the url was for 2.x?
So remove that directive, since you can’t schedule host checks in 1.x
When you get the error, look at the Host State Information and the Service State Information page for the host/service in question.
Look at what it says for Last State Change: and Current State Duration:
I think you will see that the service check has in fact failed shortly before which triggers a host check. As you stated, you already know that your host check is going to fail (which I fail to understand why). You need to fix your host check. If for some reason you are not able to ping the host (disabled) then make the host check something that does make sense, like check-host-httpd

geps · January 8, 2007, 10:01am

Sorry, I don’t understand wich directive should I remove in my configuration…

jakkedup · January 8, 2007, 11:38am

geps · January 8, 2007, 2:18pm

Ok, you was writing about that directive… but now I’ve not understood how can I change my configuratione to avoid the behavior I described in the first message…
Thank you.

jakkedup · January 8, 2007, 4:20pm

I don’t understand your problem I guess.
You say you have a host that goes down, but the service does not. How is that even possible?
So please spell it out so I can understand it, like this maybe.
“We have a host that goes physically down, and nagios does detect it. But my problem is that nagios still thinks the serice is up.” or something like that.

The way I understand your problem is like this
"You have a host that you have disabled a reply to pings, so the host check “check-host-alive” is already known that it will fail every time. My problem is, that at times, I am getting a host down alert, but the nagios status for the service is up."
If the above assumption is correct, then please see what I stated before:
“When you get the error, look at the Host State Information and the Service State Information page for the host/service in question.
Look at what it says for Last State Change: and Current State Duration:
I think you will see that the service check has in fact failed shortly before which triggers a host check. As you stated, you already know that your host check is going to fail (which I fail to understand why). You need to fix your host check. If for some reason you are not able to ping the host (disabled) then make the host check something that does make sense, like check-host-httpd”

To understand why the above make sense, you should know that nagios does NOT ever run a host check, unless the service check has failed. When that happens, nagios does run a host check. Since you already know the host check is going to fail, then of course, you are going to get a “host down alert” via email.

geps · January 8, 2007, 5:49pm

Sorry, I think it is due to my worst english.
Now I have understood that I should change chek-host-alive in check_http… so now I have two check_http, one in hosts.cfg and one in services.cfg… I don’t know if it’s correct. Thank you for your patience…

jakkedup · January 8, 2007, 6:10pm

First off, tell me this. Why will this device not respond to a ping.
If for some reason you have disabled that ability, then you need to make your “host check” something that will respond.
For example:
I’m behind a firewall so a host like “google.com” is NOT going to reply to a “host-check-alive” because my firewall is blocking ping. So, in order for nagios to be any good at all, I need to make my “host check” something that WILL WORK. So, I make the “host check” check_http, because I KNOW THAT WILL WORK. The service check that runs every 5 minutes will be check_http, and if that fails, then the host check that is ran, will be check_http also.

This is not uncommon, since many times, a service check is nothing more that a check_ping, and if that fails, the host check is check_ping also, but named check-host-alive.

geps · January 9, 2007, 8:12am

Thank you very much, your help has been very precious: I did not understand that I can substitute the directive “check-host-alive” with the recall of a plugin.
Thank you again, I will let you know if all will work fine!