Service check timeout


#1

Hi There,

My nagios is working well and this is a great software.
My problem lies in the service check, right now im getting numerous email and sms notification from the host whenever it is unable to reach the hosts, and by the time it does the second check , the hosts is already up and this has caused a lot of emails and sms being sent.
Is there a way, i can control the host check to wait at least 2 minutes and make sure the host is up before it send out the notifications ?

The thing is, the hosts is defined as down when it cant ping the destination, but this happens only for a small period of time and by 2 minutes the hosts are up. What can i do to overcome this ? Is there any settings ?

Any help in this is appreciated, as Im getting like 300 sms a night (host down , host up) .

alphademon


#2

max_check_attempts 3
normal_check_interval 10
retry_check_interval 1

These settings give a notification only after 3 failed checks (one minute time between each other)…

You can even use service escalations… I get an SMS only after 30 minutes of down… i get 3 emails before an SMS is sent, and SMSs get sent only on really critical services. if FTP goes down on a web server i really don’t care at 3 in the morning… if HTTP goes down i do. :slight_smile:

Luca


#3

As luca has more or less stated, it’s all due to your config settings. You must not be doing max_check_attempts or something. Sure, pings will fail, but if they do fail, then surely you want to try it again at least once, before you wake someone up with a pager.


#4

Luca/Jakkedup,

Thanks for the heads up guys, right now my setting is as below for one host.

define service{
use generic-service ; Name of service template to use

    host_name                       MYKULN020
    service_description             PING
    is_volatile                     0
    check_period                    24x7
    max_check_attempts              10
    normal_check_interval           5
    retry_check_interval            2
    contact_groups                  lds-bo
    notification_interval           120
    notification_period             24x7
    notification_options            c,r
    check_command                   check-host-alive
    }

Does this means a notification is sent after 10 failed checks and 2 minutes time between each other ? Should I change normal_check_interval to 10 ?

Im doing the changes today so I hope not to receive a lot of sms again.

You guys are my saviour!!

p.s. Jake, yes, i do want to try the ping at least 3 times before it actually send me the sms of the unreachable host, else ill be going back to square 1. So where should i look at ?

alphademon


#5

max_check_attempts 10
normal_check_interval 5
retry_check_interval 2

With these you should be getting a notification between 20 and 25 minutes after the host failes…

Luca


#6

max_check_attempts 10
normal_check_interval 5
retry_check_interval 2

normal_check_interval 5 # checks will be performed every 5 intervals (check your nagios.cfg for interval_length=x to determine how long an interval is)
max_check_attempts 10 #if a check fails, it will be checked again for a total of 10 times. During this time, no notifications will be sent and the status information will show what output it is getting. The “attempt” column in "service problems’ will show 1/10 for the attempt that it is currently on.
retry_check_interval 2 # if a check fails, a retry will be attempted every 2 intervals.

Now, since a retry occurs every 2 for 10 times, that works out to be 20 intervals, but your normal check interval is 5. That’s on overlap and I’ve never tested it. You can test it and make sure that you see the “attempt” collumn reaches 10/10 before a notification goes out by downing a device or make up a bogus check hostaddress ip.

I would suggest to use
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
which should be plenty even for a ping.
BTW, you should be using check_fping and not ping. It gives a quicker reply and is important when you are doing 1000’s of checks.


#7

Hi Luca/jakkedup,

Thanks so much for the valuable information.
The new setting works like a charm now. Ill get the check_fping.
You guys rock!

alphademon


#8

:smiley:
I’d add that a good quote of the thanks should go to Evert for his forum :wink:

Ciao, Luca