Stop Service Checks When Host Down


#1

I have a load of hosts that are being checked by ping to determine the host status. Each host has 5 or 6 services which check their status via check_nt.

Now when a host goes down (un-pingable) all the services go down to ass there is no route to the host that the services reside on.

So when a host goes down we get atleast 6 notifications in one go rather than just one for the host down.

Is there a way to stop checking services when that host goes down?

I know you can set service dependancies but cannot see how you make a service dependable on a host…

Now I know in some cases services can still be up and monitored even if the host shows down but in our case if a host goes down then all services are not a priority and dont need to be notified of them being down if the host is down.

Any help would be greatly appreciated

Kind Regards

Scott


#2

We cannot create a dependency between host and service using nagios dependency engine, but nagios has a decision engine to make decision not to send any email’s for services on host that is down. But your case looks strange for me why its doesn’t work.

"So when a host goes down we get atleast 6 notifications in one go rather than just one for the host down."
I couldn’t follow the above you mean 6 different notifications are different services (or) 6 different email on same host down issue.

Please refer to the below link for more details.
nagios.sourceforge.net/docs/2_0/ … ncies.html


#3

this is the second person reporting the same behaviour…
could it be a problem of the latest version? anybody with a chance to test this?


#4

ok let me try to explain this some more…

There are 3 host states UP/DOWN/UNREACHABLE

We use ping to determin our host status and get notified if the host goes DOWN. We dont notify if the host becomes UNREACHABLE because we know whats blocking access to that host is another host which will show DOWN and we are notified by that host instead. We are not interested in UNREACHABLE hosts because the host that is DOWN is what needs dealing with first.

Basically we want to do the same with services on a host when that host goes DOWN. We do not want to be notified about a service being DOWN because check_nt cannot reach the service because the host is DOWN.

Now you can set up host dependancies and hosts become UNREACHABLE when the dependant host goes DOWN.
Now you can set up service dependancies and services become UNKNOWN when the dependant service goes DOWN
But there are no config options to set a service dependant on a host?..

Am I correct in this thinking?

What I’m looking at doing is when a host goes down the services on that host become UNKNOWN rather than DOWN so that I can filter out UNKNOWN notifications for services.

So I would guess, because there is no config way of doing what I’m after I would have to set up some sort of command/macro so that when a host goes down it disables all service checks on that host and sets all service status to UNKNOWN. Then when the host comes UP then the service checks are re-enabled and services are forced to do a check.

Thank you for your input kosarajudeepak and I know what you are saying and yes I can stop email notifications going out based on host status but we use various tools that read nagios and these tools show a lot of RED when a host goes down because services are also DOWN on that host.

If I’m wrong then please correct me if I’m has anyone got any help on how I would setup this commands and macros to do what I am after?

Kind Regards

Scott


#5

What i meant is that you should NOT be getting notifications for services on DOWN/UNREACHABLE hosts.
In the tactical view services are separated as down and down (on trouble hosts) - or something similar, so probably a division of the sort you are looking for already exists.


#6

Luca I think you are misunderstanding me. Its not so much as notifications are a problem it is that I dont want services to shows as DOWN when the host goes DOWN.

In our case the service status should be flagged as UNKNOWN rather than DOWN when a host goes DOWN. Now we can create host dependancies and service dependancies but why not services dependant on hosts?

Imagine this senario. A Server (host) has HTTP,IMAP,POP3 and SMTP (services) running. This server is connected to a switch and also plugged into the switch is nagios. The switch goes DOWN. Nagios can no longer see the server as the server is dependant on the switch. the server is then given the status of UNREACHABLE and the switch the status of DOWN. Now nagios continues to service check on the server and sets all the service status’ to DOWN but they are not really DOWN they are really UNKNOWN because nagios has no route to see the services on the server. Now imagine the above but with 100 servers or hosts on the same switch and each server and host has 5 services on. When the switch dies all 100 servers or hosts show as UNREACHABLE but nagios still tries to check services on those hosts despite the host being UNREACHABLE. If we have notifications set up in their simpliest form we get 500 service DOWN notifications.

As kosarajudeepak has already pointed out u cannot setup dependancies between services and hosts but surely there is a way to stop nagios checking services on hosts that are DOWN or UNREACHABLE.

Am I making sense or am i talking a load of rubbish? Surely Im not the only one that can see the problem here…


#7

as i said:

Services should not be checked and should NTO send notifications when a host is DOWN/UNREACHABLE. You are the second one reporting this, looks like a problem in the latest release(s).
This causes the fact that you wouldn’t need a serv ice to host dependency as it SHOULD be implicit in the systems logic, and it was until some time ago. While i worked i don’t remeber a single notification for a service on a down/unreachable host.

As far as the interface is concerned have a look at my previous post.


#8

Ive now used 2 versions:

3.0.6 and 3.1.2

Both continue to check services if the host the services reside on become DOWN or UNREACHABLE

I cant believe this is a fault of version 3 as there would be way more people saying there is a problem. And if it was a problem then surely it would have been fixed in the latest releases of version 3…