Cluster based escalations


#1

I’m wondering if there is a way to do threshold based escalations for a cluster of hosts/services.

For example, say that you are running a large operation with a cluster of 10 mail servers. Now, we’ll say that there is a buffer in place such that you don’t have a high priority problem unless 3 of the 10 mail servers go down, or for that matter 3 of the 10 mail servers have an unresponsive service check of SMTP. Now, is there a way to define an escalation such that it is only triggered when 3 of 10 machines in a group go down, or 3 of 10 SMTP services in a group stop responding?

Thanks in advance,

Jay


#2

Hi!

I don’t know how to do that, but, as I like finding other ways, here’s what I’d do:
Instead of having 10 services, each checking 1 service, I would write a little script that would check your 10 servers, and report an error only if there are 3 or more servers down…

that’s just an idea; you can modify it to better suit your needs, like having a warning alarm if there are 1 or 2 servers down, and critical if more; or you can do a lot of others things like that :slight_smile:

Hope this will help you;

btw: if you find a better way, plz, share it; it might also help me ^^