Best practices for configuring escalation and high priority

treimers · January 15, 2013, 2:59pm

Hi all -

I’m looking for some specific guidance on best ways to do the following:

Configure key hosts to alert a larger group of people (including text messaging, which I do via email)
I don’t know exactly how to approach this- Mainly, I’m talking about pinging of hosts here, ie getting more people notified
when there is a critical host that cannot be reached for a period of time, or within minutes
when a key host is down (like a core switch or router)
how to escalate alerts from other lesser hosts to additional alerting messages,
without creating a flood never-ending messages.

I don’t want to simply apply the ‘critical alert’ escalation to the same service that does PING, because that service is used by nearly all hosts.

I don’t want to break up my Cisco switches into two host groups-
so I didn’t think having two PING services, and two hostgroups was the best way,
I have configured groups of Cisco switches in a hostgroup for their building.

I guess I’m confused about the best way to configure a core datacenter switch or router
for higher alerting without
= putting that host in a different group from it’s building-centric group

creating additional monitoring pings and SNMP traffic by having it in multiple groups and checked more than one
time for the same event.

My apologies for asking, but
I have read various documentation, but understanding syntax and commands
is sometimes not the same as really understanding the best route to go towards for a configuration
I can find lots of online discussions of 'getting started with Nagios’
but not so many discussions of ongoing administration and design
for best alerting without so overloading people that they begin to ignore alerts.