forums.meulie.net

Issue with status retention at startup in 2.0b3

Hi, new to this forum but not to Nagios, although I’m stuck here.

Each time I restart/reload Nagios, I get all services/host back in pending and rescheduled, although I have the following settings :

nagios.cfg:status_file=/usr/local/nagios/var/status.dat
nagios.cfg:retain_state_information=1
nagios.cfg:state_retention_file=/usr/local/nagios/var/retention.dat
nagios.cfg:retention_update_interval=60
nagios.cfg:use_retained_program_state=1
nagios.cfg:use_retained_scheduling_info=1

I have read in other posts, that at shutdown, status.dat should be replaced by status.sav, but it doesn’t. I believe that in my case it is retention.dat that does this, and it seems to be doing it but it doesn’t work. Here is an example :

[ROOT@emeanoc01:/usr/local/nagios/var] (94) # rm retention.dat

[ROOT@emeanoc01:/usr/local/nagios/var] (95) # ll st* ret*
-rw-rw-r-- 1 watchmon watch 210K Jul 18 13:32 status.dat

[ROOT@emeanoc01:/usr/local/nagios/var] (96) # nstop
Stopping network monitor: nagios OK ]

[ROOT@emeanoc01:/usr/local/nagios/var] (97) # ll st* ret*
-rw------- 1 watchmon watch 218K Jul 18 13:32 retention.dat

[ROOT@emeanoc01:/usr/local/nagios/var] (98) # nstart
Running configuration check… [PASSED]
Starting network monitor: nagios OK ]

[ROOT@emeanoc01:/usr/local/nagios/var] (99) # ll st* ret*
-rw------- 1 watchmon watch 218K Jul 18 13:33 retention.dat
-rw-rw-r-- 1 watchmon watch 186K Jul 18 13:33 status.dat

My questions are :

  1. Is my assumption about retention.dat instead of status.sav correct ?
  2. should this file be deleted automatically at startup (like status.dat at
    shutdown?)
  3. Why the heck doesn’t it just work ? :frowning:

Thanks for any hints.

Clipper
:shock:

First off, you have forgotten what the retention file is used for. It’s used to store the directives of some settings. For example, I can change notifications on in the services.cfg file, but after a restart, I get no notifications.
See this for the reason why.
nagios.sourceforge.net/docs/1_0/ … ntion_file

The problem you are haging is with the status file.
nagios.sourceforge.net/docs/1_0/ … tatus_file
This file is deleted every time Nagios stops and recreated when it starts.

Your other problem is with status.sav
nagios.sourceforge.net/docs/1_0/ … ntion_file
Make sure you have proper permissions also.

Hi Jakkedup,

Thanks for your answer, unfortunately not relevant to my issue. I have done some more research on this and here is some more info.

  1. I’m using 2.0, not 1.0 so retention.dat is the “new” name for status.sav in the sample configs
  2. I fully understand what the retention is for, in particular, when I stop and restart the server 60 seconds after, I expect to get all my statuses back. They are not. Everything is pending and immediately rescheduled.
  3. At shutdown, status.dat is deleted and retention.dat updated, which is correct.
  4. My config is about retaining pretty much everything as posted in my first message, but it ain’t.

I’m currently adding debugging messages in the code to find out why…

Clipper
:x