I’m working with Nagios 3.2.1. My hosts are setup to retain status information and non-status information across restarts. However whenever I restart or reload Nagios the Host Status duration counter for each host–except the localhost–resets to 0d 0h 0m 0s. Why is that? I’d like the counter values for the hosts to be maintained over reloads, like is being done for the localhost, but I can’t seem to find what I’m missing in order to achieve this.
why do you need to restart?
anyway there must be something wrong in your setup… check the logs for errors… probably it can’t write or read the retention file somehow.
Because the hosts I monitor change with some regularity and are added and removed on a frequent basis.
Actually, no. I thought so too at first but there are no errors and the hosts that were exhibiting the previously described behavior had the same configuration attributes as the localhost, which does not act in the same manner.
I did end up figuring it out though and in reality, the answer is quite a bit simpler. The time indicated in the host status field is the time relative to the host’s last status change. When a host is newly added it’s status initially has a value of PENDING, until the first host check is performed. From PENDING it will change state to either UP or DOWN. If the host goes from PENDING to UP status, the duration begins counting up, but Nagios does not consider this a status change as far as the retention data is concerned, and if a reload is issued before the host changes state again, the counter will zero-out. However, once the host goes down (changes state) for the first time, counter values will hold over reloads. The same is not true though if a newly added host goes from PENDING to an initial state of DOWN. In this case, the counter values are held over reloads.
Nagios identifies hosts whose counters zero-out on reload due to never having changed state (or perhaps more accurately, whose initial state has not been time-stamped to the retention file) by putting a plus (+) to the right of the elapsed time value. This plus will remain until the host changes state for the first time, placing a value in the retention file. The presenance of this indicator leads me to believe the logic described above is done by design and not by error. What I haven’t determined yet is if there’s a setting to change this and treat the counters for a new host that goes from PENDING to UP the same as one that goes from PENDING to DOWN as it’s initial state.
Do not use restart. If you need to restart stop nagios, and then start it. if you cjhanged the config use reload.
Not sure if this changes the counters behaviour, but id’ exepct them to remian where they are instead of resetting.
I am using reload not restart… I suppose I should have been clearer about that, and just noticed I said restart in the title of the thread. Either way, that doesn’t seem to have an impact on what’s being described above.