Ease up on my check intervals?


#1

I’m more or less looking for an opinion here:

Now that I’ve got Nagios running smoothly, I’ve substantially added to the number of hosts and services we’re monitoring. Here’s what I’ve got right now…

18 hosts

41 services

#Services.cfg
check_interval 5
retry_check_interval 1

I’ve set that check_interval and retry_check_interval for every service that’s being monitored. I’d prefer that the check_interval be consistent in every hostgroup. I’m thinking, though, that mayhap having Nagios check each service every 5 minutes will be too much because there are 41 to check (there used to be only 26). Opinions?


#2

i have over 500 at 10 minutes… not an issue… check you server loads with “top” or any other tool and see if the server has problems but i doubt it… (or are you using an 8086 processor?)
41 checks in 5 minutes is nearly one check every 15 seconds…

Luca


#3

[quote=“luca”]i have over 500 at 10 minutes… not an issue… check you server loads with “top” or any other tool and see if the server has problems but i doubt it… (or are you using an 8086 processor?)
41 checks in 5 minutes is nearly one check every 15 seconds…

Luca[/quote]

Hah…well, now the bossman says he wants us running checks every minute on each of our 41 services…I’m thinking that there are definitely some more critical services that maybe could be monitored that often, but I don’t think it’s necessary for all of them, though my boss views all of these hosts as equally critical. Shrugs Nagios is holding up just fine so far. I guess we’ll see how this goes.


#4

Making checks every minute is excessive if you ask me. But with only the number of services that you have, it shouldn’t be a problem to perform them.


#5

Yeah, I agree. But doing this has allowed us to discover a potential issue on one of our other servers. When I got here, my boss has already installed an older devel version of Nagios. The one I installed is running just fine, as far as I can tell, but for some reason, the original Nagios seems kinda flaky. It keeps giving us Host Down alerts at random, when the other Nagios server (the newer one) doesn’t report anything. I’ve checked configs and all is well on both boxes, check intervals, interval lengths all the same, so I’m guessing there’s something wrong locally on the first Nagios box.

As a completely unrelated sidenote, I installed SuSe 10.0 on one of our other servers and found that it seems to come with a version of Nagios already on it…it’s got some plugins though that I haven’t seen with the Nagios vers. I’m running now.


#6

Ooops, i just noticed that it was your retry that is set to 1, and your normal check interval is set to 5, which is perfectly normal and should not be any problem at all. I’m making 1225 checks every 5 minutes, but then 684 of them are passive checks, so that makes 541 active checks.