Nagios just stops doing checks. MySQL issue?

RichardA · March 19, 2010, 10:08am

This install has been running fine for a couple of years, but recently it stops working every week or so. There are no errors in the logs, the web interface works, but the Scheduling Queue doesn’t change.
Optimising the MySQL tables seems to fix it for a while, but I’d like to find out what is actually happening. I have phpmyadmin installed, but I don’t know which the performance stats to look at, as I’m not a DBA.

luca · March 22, 2010, 6:16pm

how big are the mysql tables? maybe you are hitting some mysql limit after a couple of years it could be… Sure there’s enough disk space?

RichardA · March 23, 2010, 11:01am

The whole Nagios DB is 1.7GiB and nagios_servicechecks is 871MiB, according to phpmyadmin. Too big?

Should I empty each table and restart? I don’t need historical data, but I do need Nagios to read in all the cfg files and start up again properly.

luca · March 23, 2010, 12:32pm

backup the DB and try… shouldn’t take too long. But i think 2 Gb shouldn’t be a limit… even if they are quite big tables.
No errors in the nagios/mysql/system/apache logs when it stops?
I know you said no errors in the logs… just seeing if you checked all possible logs

RichardA · March 29, 2010, 9:25am

I took a snapshot of the VM, stopped Nagios and used phpmyadmin to clear (not drop) all the tables. I should have optimized too, but forgot.
When I restarted Nagios it read the alert history and so on back in (from the logs, presumably), so I didn’t lose any state.
No issues since then, so I think I will do this a couple of times a year.

luca · March 29, 2010, 2:58pm

so it read everything from the logs and put it back in the MySQL tables?

RichardA · March 29, 2010, 3:22pm

I thought it would just load config stuff from the cfg files, but it does seem to have parsed the logs for notification history and so on. I’m not saying I haven’t lost anything, just that I haven’t lost anything that I can tell.

The server looks ok, but I won’t be sure until it has run for a few months without intervention.