This install has been running fine for a couple of years, but recently it stops working every week or so. There are no errors in the logs, the web interface works, but the Scheduling Queue doesn’t change.
Optimising the MySQL tables seems to fix it for a while, but I’d like to find out what is actually happening. I have phpmyadmin installed, but I don’t know which the performance stats to look at, as I’m not a DBA.
how big are the mysql tables? maybe you are hitting some mysql limit after a couple of years it could be… Sure there’s enough disk space?
The whole Nagios DB is 1.7GiB and nagios_servicechecks is 871MiB, according to phpmyadmin. Too big?
Should I empty each table and restart? I don’t need historical data, but I do need Nagios to read in all the cfg files and start up again properly.
backup the DB and try… shouldn’t take too long. But i think 2 Gb shouldn’t be a limit… even if they are quite big tables.
No errors in the nagios/mysql/system/apache logs when it stops?
I know you said no errors in the logs… just seeing if you checked all possible logs
I took a snapshot of the VM, stopped Nagios and used phpmyadmin to clear (not drop) all the tables. I should have optimized too, but forgot.
When I restarted Nagios it read the alert history and so on back in (from the logs, presumably), so I didn’t lose any state.
No issues since then, so I think I will do this a couple of times a year.
so it read everything from the logs and put it back in the MySQL tables?
I thought it would just load config stuff from the cfg files, but it does seem to have parsed the logs for notification history and so on. I’m not saying I haven’t lost anything, just that I haven’t lost anything that I can tell.
The server looks ok, but I won’t be sure until it has run for a few months without intervention.