I know nagios’ config contains a inter_check_delay_method value, representing how nagios should schedule service checks when it starts up. We’re currently using “smart”.
Also of note, we’ve currently got retain_state_information set to 0, because of the annoyance of service check definitions not actually changing when the system is reloaded. (We use a custom in-house nagios configuration manager).
We’ve got about 2500 service checks set up in nagios. When we restart the system it tends to take a while for all those services to get checked initially. The annoying part is that the “smart” distribution method when nagios restarts can take minutes to perform the first check of a service that usually gets checked each minute, based on the smart initial distribution.
The question I have is, if I use state retention, would that have any impact on nagios’ post-restart scheduling?
The real goal is to have service checks resume as quickly as possible after restart, almost as if the system was no restarted. Understandably that’s not going to happen.
The reason for this interest is that we save the performance data from many of our service checks, often checks that execute once per minute. But when nagios is restarted, some of those minutely checks don’t start getting checked again for 5 minutes. This amounts to missing performance data each time we restart nagios.
So, would it be foolish to change the inter_check_delay_method to a hard value? Would using state retaintion get service checks resuming any faster?