Nagios polling interval

himohit · July 20, 2005, 7:40pm

How often does nagios poll the servers and where can i change how often nagios checks the servers and its services. Does it email immediately when some service is down?
offlate one of the services i was monitoring with nagios, went down and the email notification came about 10 minus later.
Can someone help me to the cause, as to why this could have happened.
Thanks:)

luca · July 21, 2005, 10:05am

max_check_attempts 3
normal_check_interval 10
retry_check_interval 1

I suppose this should answer your quesition…
the host is checekd every 10 minutes, IF it fails it is checked again for another 2 times at 1 minute interval, if the third check fails a notification is sent.

It’s your configuration, not nagios.

Luca

jakkedup · July 21, 2005, 2:53pm

Luca is right, you tell us how often nagios checks and retries until a notification is sent. We can’t see how you setup the .cfg files unless you tell us.

himohit · July 21, 2005, 7:11pm

Thank you very much Sir Luca and Jake,
here is a part of my services.cfg file, which clearly indicates that my polling interval was 10 minutes, which is quuite high
Sir, if i have to set it to 30 seconds interval, will the max_check_attempts be .60, does nagios recognize decimal point. Also is the email notification send immediately when there is a service interruption. Thanks

define service{
use generic-nix ; Name of service template to use

    host_name                       TS01
    service_description             UPTIME
    is_volatile                     0
    check_period                    24x7
    max_check_attempts              3
    normal_check_interval           5
    retry_check_interval            1
    contact_groups                  Admins
    notification_interval           120
    notification_period             none
    notification_options            c,r
    check_command                   check_nt_uptime
    }

himohit · July 21, 2005, 7:12pm

Sorry thats 5 minus and not 10

himohit · July 21, 2005, 7:34pm

Another addition, my notification period is not none, it is set to 24X7

jakkedup · July 21, 2005, 8:07pm

max_check_attempts 3
normal_check_interval 5
retry_check_interval 1

So the service is checked every 5 minutes, if in your nagios.cfg you have interval_length=60. You have a retry of 1 and a max check attempts of 3, so that makes 8 muntes. Sounds like it’s working just the way you have it defined, give or take 2 minutes.

But if what you really want, is to be notified approx in 5 minutes, then you need to define it that way.
max_check_attempts 1
normal_check_interval 5
retry_check_interval 1

per the docs “
max_check_attempts: This directive is used to define the number of times that Nagios will retry the service check command if it returns any state other than an OK state. Setting this value to 1 will cause Nagios to generate an alert without retrying the service check again.”

jakkedup · July 21, 2005, 8:09pm

Use caution though, some service checks just might fail due to load at that given time on your network, timeout’s packet loss, etc. Not allowing check_ping a second chance one minute later, might cause you to get lots of notifications that have nothing to do with the service and everything to do with your network.

himohit · July 21, 2005, 8:27pm

As of now i have put this,
max_check_attempts 3
normal_check_interval 1
retry_check_interval 1
Also the interval_length=30(seconds)
Does this mean every 30 seconds it will check. Does the notification come immediately when nagios sees any service down during that interval

himohit · July 21, 2005, 8:30pm

Sir,
the scenario in my case is as follows. As soon as a particular service goes down, i need a warning signal as soon as possible, because that particular service is responsible for our uptime or downtime. Thats the reason, i am trying to keep a continuous check on that service so it can generate an alarm and email immediately to all responsible IT group.
thanks

jakkedup · July 21, 2005, 9:49pm

Also the interval_length=30 means 30 seconds for each interval.
Since normal_check_interval 1 that means 1 interval which is 30 seconds. I don’t know why you would want to check something every 30 seconds.
retry_check_interval 1 which is 30 seconds.
max_check_attempts 3
So if a service fails it will retry the check 3 more times, each retry will be 30 seconds apart. You do the math, my head hurts.
I don’t know why you are doing this, but it’s excessive if you ask me.
60 second intervals makes for easy math.
retry_check_interval 5 means 5 minutes, nice and easy.
retry_check_interval 1 but since max_check_attempts 1 is how I would do it, but hey, whatever works for you.

himohit · July 22, 2005, 12:36am

Ah, now i get a more clear picture. Thanks

luca · July 22, 2005, 8:00am

anyway as said by Jakkedup give it at least a second chance to cut off most false warnings.

Luca