More granular host monitoring?


#1

Hi there …

Nagios newbie so please bear with me!!
Have been charged with rolling out Nagios as our new network monitoring system.
Everything so far seems to have the potential to work okay, except for “host down” monitoring.
From what I can see so far , the gap between a host going down and an alert being generated by Nagios is defined by the “check_interval” directive , and the least amount of time this can be set is one minute.
So potentially , it could that length of time before we are aware that one of our hosts is down.
However , for my purposes this is far too long , we require pretty much instantaneous notification ( or as close to it as is possible ) of a host being down.
Can greater checking interval of a host be configured anywhere??
Or perhaps a trap based solution be more feasible ??

Any help or pointers would be greatly appreciated!!

An Brutog


#2

I found you can set the intervals to less than a minute, like:

normal_check_interval   0.25
retry_check_interval    0.25

for 15s check/retry intervals

HTH

/S


#3

Thanks a mill Strides , getting notifictions quicker now.
Unfortunately not quick enough though, still seems to be about a minute - 90secs before a host goes down and the time I’m notified of the problem.
What directives can I modify to speed this process up??


#4

change max_check_attempts to something like 1… when service check running under the ‘normal’ check interval fails the service goes into a ‘SOFT’ down state, nagios then re-tries the service X number of times using the ‘retry’ interval, where X is the max_check_attempts value. So if you have this set to something like 5 at the moment, you have a ‘normal’ check fail, then 5 x 15 sec interval ‘retry’ checks before the service goes ‘HARD’ down and the notification is sent - that’d make it between 75 and 90 secs.

HTH

/S