Host check or service check?

tsettle · January 30, 2011, 6:14pm

I’m in the process of setting up a new Nagios server, and so far, I have things working, but not the way I want.

I created (edited) the generic host template, with a check_command of check-host=alive. I then created hosts using that template. When I apply the configuration, nagios issues warnings that no services are associated with the host, however, every host easily changes from PENDING to UP.

Now, back to the services thing… I create a generic service template, again using the check-host-alive. I then create a service, and set host=* to apply it to all hosts. I don’t care to have duplicate checks, so I remove the check_command from the host template. However, when I check the status, every service is up, but the hosts never change from PENDING.

So, back to the host template, I put the check_command back in, restart nagios, and every host moves from PENDING to UP. Every service is UP. Beautiful!

The next step, was to simulate an outage… the results are undesireable, nagios sent a notification for both the host and the service. I really don’t need both.

And now for the questions:

If I have both a host check and a service check defined to ping the host, is nagios actually doing the check twice?

If a host is down, and a notification is sent… why would nagios also send notifications for the services on that host? Does nagios not understand that if a host is dead, it’s services are also dead?

Why does Nagios issue warnings if a host doesn’t have a service? At this point, I only care if the host is alive or not. If I omit services, will I run into any problems with reliability or accuracy?

Thanks!

luca · January 30, 2011, 7:23pm

A host_check_alive check is usually run ONLY if aservice is down… (ne need to check if the host is up if a service is up).
Nagios wants service checks as it’s what it is made for… it’s a warning… “hey possibly you forgot something”

What kind of service checks are you running? try a check_http or check_ssh and use check_ping as check_host_alive on the host. The host definition doesn’t require a check_command, but if it’s missing i’d exepct the host to be pending, not sure, never tried

I’d recomemend a look at the docs for the host/service deifniiton and notifications descriptions
nagios.sourceforge.net/docs/3_0/

tsettle · January 31, 2011, 9:09pm

Well, for the initial configuration, I really didn’t care about checking any services. I just wanted to see if a host was there or not. I only added check-host-alive as a service check to clear the warnings from Nagios.

On the host template, I had left the check interval blank, and the initial state unset. After much fussing and googleing, I discovered that if I set the initial state to UP and set the check_interval to 0, I’d be in good shape!

Neither of these options are clear in the documentation:

Not true… the default apparently is to have a PENDING state… not up, down, or unreachable.

Nothing in here about leaving it unset and defaulting to 5 minutes… setting it to 0 did the trick.

Finally, I think I have a workable template:

define host { name basic-host check_command check-fping initial_state o max_check_attempts 5 check_interval 0 retry_interval 1 active_checks_enabled 1 passive_checks_enabled 1 first_notification_delay 7 ...stuff removed... }

If I got this right… nagios won’t check the host until the services are dead, then it will retry the host check 5 times, with a 1 minute pause between checks, but a notification won’t be sent out unless the host is down for at least 7 minutes.