Nagios checks all stopped for no reason


#1

Hi all

got a really strange issue here and wanted to find out if anyone has seen it before

I’m running 3 instances of nagios

Machine 1 is in America
Machines 2 and 3 are in our datacenter in the UK which is locked down to only be accessible via our local network

Each of the 3 run the same set up
Nagios 3.2.0
Nagios Plugins 1.4.13
Ndoutils 1.4b8
PNP4Nagios 0.4.14
Mrtg 2.16.2
However machine number 3 also runs dnx
This morning I came in and found out machines 1 and 2 had a last check time of 23.00 last night, I had a dig around and found that the next check time for hosts and service checks was set to 23.00 tonight

If I forced a check of a host and services then the normal pattern of checks resumed (every 10 mins)

To get it to start running again I had to remove all hosts and restart nagios then add them again and do another restart

I’ve been through the logs and cant see anything, and likewise my syslog doesn’t show anyone logged onto the boxes at that time.

One thing I did notice is the daily logs seem to set the host state at 0.00

[1256338800] CURRENT SERVICE STATE: Nagios-Central;Check_Netstat;OK;HARD;1;OK - http_in is 0

So has anyone else seen this kind of behaviour with Nagios 3.2.0?

One thing to note is the machine which uses DNX was not effected

Also as we are UK based and we moved the clocks back on Sunday Morning I wonder if this may be to blame

Cheers

Mike


#2

In the UK, also seeing same problem probably due to BST -> GMT switch yesterday, see this thread:

viewtopic.php?f=60&t=5503&p=17861#p17854


#3

cheers mate

looks like a nasty bug hope it gets fixed by next year :wink:

Mike