Is there a way to say that if you have x number of services all failing to either send only one notification for all of them (if the groups are the same) or have it test to see if the host is up, and only send the host down message?
I thougth I had everything working the way I wanted then did a reboot, and got a lot of email
you should either setup host/service dependancies or define parents in your hosts.cfg file.
You see, if nagios makes a check of a service and it fails, it then checks to see if the parent is down. If the parent fails, it checks itâs parent and so on, until it finds a parent that is âUPâ. So now nagios knows, that the problem is not every single host, but just one host that is blocking all the others from working. So now, since your contacts.cfg says you only want c, w, r you donât get the âunreachableâ emails. But since you donât have it setup like that, then how would nagios know what is important and what is not?
I fail to see the case that our httpd server is going to go down at the same time as the other server running ftp. If it does, then surely you want to know this donât you? If the problem is not actually the ftp or httpd server but only a network problem, then why havenât you defined your network as part of nagios? Nagios is not just to monitor a bunch of PCâs. Those pcs are plugged into the netork, so you should be monitoring that also.
If you have setup your nagios the way I"ve described in dozens of threads in this forum, like the âadding switchesâ then you wonât have the problem you describe.
But to answer your question flat out, NO. Nobody is going to want a tool like nagios, that will only send out one email when 12 hosts are down, if in fact there are 12 problems. But, if there is actually only ONE problem, but it is blocking 11 other hosts from working, then that is due to you not configuring nagios as Iâve described.
ps, why would a reboot show all of your hosts down? Did the checks actually fail? They must have failed since you got alerts. Does it show this behavior each and every time you reboot?
If so, change nagios.cfg
retain_state_information=1
state_retention_file=/usr/local/nagios/var/status.sav
use_retained_program_state=1