Nagios behaviour with service check, when host is down or un


#1

Hi everyone,

I hope this question had never been asked before, but I searched in the Nagios documentation and I didn’t find any answers.

I just want to know a thing :

If a host goes down or unreachable (HARD state), does Nagios will check every services of this host or Nagios will pass the services check to improve performance ?

If it pass services check, that’s cool, I have nothing to change.
But if not ? there is a way to prevent Nagios to check services of a host down or unreachable ?

Sorry for my bad English. Thanks in advance.

Cheers, Eryas.


#2

Hi,

bad news for you:
nagios keeps on checking the services and doesn’t even check the host anymore … (which can be a problem, btw!)

Here is a scenario:
-a service goes down on a host

  • => nagios tests the host
  • host is found DOWN => nagios notifies host DOWN
  • nagios keeps on testing the services, and notifying HOST DOWN
  • one service goes UP => the host is assumed to be UP => the HOST UP notification is sent

it has some logic, especially knowing that (apparently - I don’t know why), host checks are more ressource consumming than service checks (host checks are not scheduled, so they have to be inserted in the plan).

Hope this is clear enough

btw: my problem is that, even if the host goes UP (machine is restarted) but no service is UP (process monitored not restarted, or NRPE not restarted), nagios won’t detect that the host is up and will still notify the HOST DOWN alert … which is really stupid and confusing for the team that reads the alerts.


#3

Bad news indeed :frowning:

I think I’m going to “hack” Nagios to prevent services check when host state is not UP. That’s the only solution I think.

But for your problem, I think it’s because you let de default value for Check Interval in host configuration with 60 minutes (or active check isn’t enabled)

Like you said, Nagios execute host check on first time Nagios run, every change in related services and at regular intervals of 60 minutes (by default and if active check isn’t disabled). So if you change default value to 5 mins i.e. .even if your services won’t come back to state UP, Nagios will check the host reachability :D, so your host may change state to UP.


#4

How about using an event handler in the host object to fire the DISABLE_HOST_SVC_CHECKS external command when the host is down (and conversely ENABLE_HOST_SVC_CHECKS when it comes back up) ?

That would fit the requirement I would think…


#5

excellent idea!
I’ll try it when I’ve got some time, and see if this solves my problem :slight_smile:


#6

Thx for the help, going to make some test :smiley:


#7

Ok seems to work well, thx for the tips Strides.

I just hope that script and external command are safe and won’t bug when host recover UP state. Can’t let host services unchecked for customer network, or I’m a dead man !lol


#8

No worries. Always worth trying these potentially career limiting work-arounds in a test environment first if possible, but hey, I’m sure it’ll work nearly as well as you hope. :wink: