'Last Check' on Host Detail page way out of whack

jnojr · October 13, 2008, 11:18pm

Nagios 2.9 I currently have 39 hosts and 67 services monitored. When I go to the Host Detail page, the “Last Check” column is from several days to several months old for most hosts. Two show N/A and Durations of 567d 2h 9m 16s For every one of those hosts, I can look at View Status Detail For This Host and see that checks have been run in the past few minutes.

Where does Nagios get this info from, and how can I reset it so it’s accurate?

512 · October 14, 2008, 6:22am

Hi,
I am not sure but i think this may help you.
Try by changing the value at
command_check_interval

in your nagios.cfg file

512

Strides · October 14, 2008, 8:38am

Smells to me like a case of 2 nagios processes… you might like to stop nagios, and then do a killall -9 nagios to make sure everything is stopped, before restarting.

/HTH

/S

jnojr · October 14, 2008, 2:50pm

You have no idea how many times I’ve done that, and how much tweaking I’ve done to /etc/init.d/nagios in an attempt to keep it from happening

This isn’t it.

jnojr · October 14, 2008, 2:53pm

[quote=“512”]
Hi,
I am not sure but i think this may help you.
Try by changing the value at
command_check_interval

in your nagios.cfg file[/quote]

command_check_interval=60s

Even if it was 60 days, it wouldn’t explain some items that say the “Last Check” was six months ago.

Strides · October 14, 2008, 9:07pm

Well it was just a thought. Have you stopped it and binned off retention.dat & status.dat? That should reset the beast…
…maybe

jnojr · October 15, 2008, 3:27pm

“Binned off”? Do you mean move them adn restart nagios so they’ll be recreated?

I can certainly give that a try…

Strides · October 16, 2008, 9:40am

aye… everything will/should go back to “pending”

jnojr · October 17, 2008, 10:51pm

OK, I moved retention.dat, started nagios, everything went to Pending and then OK… and already, the Last Check times are stringing out throughout the day. Each individual service shows a time that looks reasonable, but on the Host Detail page, a few show the last few minutes, and the others are various times going back to this AM.

Taius · October 30, 2008, 5:28pm

You are not alone… I didn’t try moving retention.dat and status.dat–but i have the exact same problem with nagios and the reporting of checks. But my symptoms go deeper. I also run PNP and I lose my performance data; forced active checks do not run when scheduled.

I have cron jobs that restart nagios, nsca (it dies under xinetd and daemon for me), and now I think I need to add ndo2db to the periodic restart. I’m at wit’s end.

Albin · October 31, 2008, 12:42pm

The Last Check on Host Details page is a last chek-host-alive check for that host. It doesn’t have any connection with regular service checks. Host check (check-host-alive) is executed only when all services fail (read, have Critical states) and when there is no Last Check information (like in the case you’ve moved retention.dat file and every host was in Pending state).

So, don’t get confused with the Last Check info on host’s, those are executed only in special conditions, and not on a regular basis. Services are checked on a regular basis.

This is cut/paste from official 2.0 version documentation, under Host definition
nagios.sourceforge.net/docs/2_0/ … .html#host
check_interval: NOTE: Do NOT enable regularly scheduled checks of a host unless you absolutely need to! Host checks are already performed on-demand when necessary, so there are few times when regularly scheduled checks would be needed. Regularly scheduled host checks can negatively impact performance - see the performance tuning tips for more information. This directive is used to define the number of “time units” between regularly scheduled checks of the host. Unless you’ve changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.