We have Nagios Version 3.0.1 und NSCA Version 2.7.2.
We have a passive check that is started by cron on a linux machine once a day. This has been working fine for more than two years. I hadn’t noticed it, but some time ago, we started getting a second message for this service appox 4-5 minutes later that looks like this:
[1297958699] SERVICE ALERT: nts-probe-01;Backup;OK;HARD;1;(null)
I looked through the log archive and see that this message almost always comes but, there have been days when this second message does not appear. I did a little investigating and found another passive service that has a “ghost” message like this. The OK message comes and then a couple of minutes later there is the “(null)” message. Last week, because of a different problem, we stopped Nagios and NSCA, deleted the retention.dat file and then restarted Nagios and NSCA. Since then the second message is gone.
Depsite the fact that the script works correctly and reports the correct status, there is a person in our department that insists something is “wrong” that the second message does not appear, and I have been tasked with finding an explanation. Does anyone have an explanation where this kind of message can come from so I have an explanation why it disappeared? I would be grateful for any information.
Thanks in advance!