Nagios notificaitons and scheduling downtime

keysorsoze · September 29, 2008, 2:35pm

Hi! we currently check roughly 150 devices with around 15 services per device. We usually have instances where we need to power down all devices in the data center and cut to a dr datacenter. During the outage we schedule downtime for each host. However when powering down the host we still get hammered with tons of service notifications. Am I correct to assume that we need to schedule both downtime for the host as well as the services on the host? Can Nagios be configured to say “hey the host is down” don’t perform anymore service checks and don’t send out notifications for a service on a host that is down. Would I need to setup service dependencies? Please advise the best solution. Also if it comes down to having to schedule downtime for the hosts as well as the services what do you recommend for scheduling them in bulk? Service groups? Host groups?

Thanks.

Strides · September 29, 2008, 4:38pm

[blockquote]Am I correct to assume that we need to schedule both downtime for the host as well as the services on the host? Can Nagios be configured to say “hey the host is down” don’t perform anymore service checks and don’t send out notifications for a service on a host that is down.[/blockquote]
…yeah that seems to be the case - if there is a way to make nagios automagically understand the obvious, i.e. that scheduling downtime for a host means that service checks are gonna fail, then I’m yet to find it… !sad
The good news is that you can either use hostgroups and do ‘Schedule downtime for all hosts in this hostgroup’ & ‘Schedule downtime for all services in this hostgroup’ or use servicegroups and use ‘Schedule downtime for all hosts in this servicegroup’ & ‘Schedule downtime for all services in this servicegroup’ - both options are like 2 or 3 clicks of a mouse, so which one you go for is probably going to depend on how “awkward” it will be to implement either…
I’d imagine the hostgroup would be easier to achieve, for instance, if all of your 150 hosts all need to go down at the same time and they are all based on one or a few templates, define the hostgroup somewhere, then just add the hostgroup variable to the template(s) and it’s job done… also, this way if you decide you need to keep 2 or 3 or x hosts monitored, like if they aren’t part of those that go down, then you just modify their hostgroup variables in their respective host objects and put them in a second “non-downtime” group, thus overriding the template entry. Would seem on the face of it to be less typing IMHO…

Another option would be to use the “external commands” functionality and make use of a couple of commands that would seem to fit the bill…
SCHEDULE_HOST_DOWNTIME
nagios.org/developerinfo/ext … and_id=118
SCHEDULE_HOST_SVC_DOWNTIME
nagios.org/developerinfo/ext … and_id=122

HTH

/S

keysorsoze · September 29, 2008, 10:38pm

Thanks Strides, your explanations and tips have helped a lot. Looks like I will just write a large shell script with the external commands.