Nagios performance page

Bibelo · April 4, 2011, 8:41am

Hello.

I’ve got a question about the Nagios performance management page here.

There’s 2 sections that look contradictory :
Schedule regular host checks
Enable cached host checks

in the first one Schedule regular host checks, it reads :
Scheduling regular checks of hosts can actually help performance in Nagios

In the second one Enable cached host checks, it reads :
on-demand host checks can benefit from caching (…) In order for cached checks to be effective, you need to schedule regular checks of your hosts

It looks a bit contradictory to me. I thought it was either Scheduled hosts check or on-demand host-check, not both at the same time. But here, it says you have to put them together, it’s not very clear to me. Can someone please explain ?

Thank you very much.

rabinnh · April 4, 2011, 12:14pm

Let’s say you schedule regular host checks, and they are cached. Now a service or dependent host is unreachable. Nagios can check the cached state, and if it’s recent enough, it may not have to do another “on demand” check, thus improving performance.

I can tell you what I do, which seems to work well. For every host that I actually want to monitor services on, I always create a separate “ping” service definition with a 1 minute interval. I do this so when I look at performance graphs I always have a measure of latency. For hosts that are just in the “path”, I just define the host and make sure my dependencies are set so I will know where the issue is (see following paragraph).

I don’t schedule regular host checks. If your dependencies are set up correctly, Nagios will execute a host check when a dependent host is down or unreachable.

Of course there are about 1000 ways to skin this cat, but my goals are to get prompt alerts, suppress unnecessary notifications, and record performance data for important hosts and services, so I feel that this type of configuration strikes a nice balance between performance and detail. It seems to be very efficient and I can scan 100s of hosts and almost 1000 services using a tiny fraction of the CPU on a dual atom system

BTW, one of the other things that I do to improve efficiency is to put spool/checkresults and cache/nagios on tmpfs so that they are written and read from memory rather than disk.

Bibelo · April 5, 2011, 8:58am

Thank you for your very interesting answer.

The tmpfs tip is very interesting. How much memory did you allocate for these directory ?

I’ve checked and it appears that :
/var/lib/nagios3/spool uses about 3 MB
/var/cache/nagios3 uses about 20MB

So I plan to give twice, like 50MB

As for the schedule/on-demand check for the host, it’s not clear in the configuration I have on my servers (not made by me). Actually, as I explain in this other post, there’s not check_interval in the host template definition… So I will try to put a check_interval to zero and see the result.

I’m not interested in having a hostcheck. Actually, we are polling routers, so there’s servicechecks already polling, and it seems it’s also doing hosthecks for the all the devices…

rabinnh · April 5, 2011, 12:33pm

The way tmpfs works, the system will only use what it needs, it doesn’t reserve memory. So don’t be afraid to allocate enough that you know you won’t run out, and take into account how much free memory you typically have on your system and give yourself plenty of slack.

Bibelo · April 5, 2011, 12:47pm

Cool !

You mean if I create a 200 MB tmpfs, if the system uses only 20 MB, it will use only 20 MB in memory ? But at the same time, I should not reserve too much.

Thank you.