I have configured a nagios system monitoring some 120 servers with, currently, about 500 services (soon to be 300 servers). I am using the nagios_grapher add on to provide RRD graphs for selected metrics. That was a feat in itself getting nagios_grapher working, but I really like it (alpha version).
I have noticed though, that every now and again there are gaps in the graphs, which can only put down to nagios not pulling the information back in a timely fashion. In particular, if a number of hosts go down (and subsequent services) it seems to slow the whole polling of nagios down, so all graphs then start missing data. We had a planned outage of 15 servers and during the 4 hour outage, all graphs in nagios were not storing data.
Any suggestions ?