Nagios polling & nagios_grapher


I have configured a nagios system monitoring some 120 servers with, currently, about 500 services (soon to be 300 servers). I am using the nagios_grapher add on to provide RRD graphs for selected metrics. That was a feat in itself getting nagios_grapher working, but I really like it (alpha version).

I have noticed though, that every now and again there are gaps in the graphs, which can only put down to nagios not pulling the information back in a timely fashion. In particular, if a number of hosts go down (and subsequent services) it seems to slow the whole polling of nagios down, so all graphs then start missing data. We had a planned outage of 15 servers and during the 4 hour outage, all graphs in nagios were not storing data.

Any suggestions ?

put higher heartbeat values in the RRAs.
if you have a 300 seconds step you may want 1500 seconds heartbeat, thisway if you miss 4 values it will still interpolate…
Anyway this is not fail safe… i’m still missing some values even in RRDs where i am sure i get all values pulled from nagios… (using nagiosgraph by the way)