I’ve been experiencing an issue with nagios stalling after a certain amount of time in processing external commands. When I restart nagios it will immediately accept external commands. I tail the nagios.cmd file, and I see the entries are getting put in there by my various scripts, and I tail /var/spool/nagios/nagios.log, and see that nagios is processing the entries (EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT -> SERVICE ALERT -> SERVICE NOTIFICATION). However, after a certain amount of time, I haven’t figured out how long, but it seems to be as little as a half hour, nagios stops processing my external commands.
The entries are still be placed into the nagios.cmd file, and nagios appears to be processing the file and removing the entries from it, as usual, but nothing appears in the nagios.log, and no notifications are sent. I then try to send a bunch of external commands through (all PROCESS_SERVICE_CHECK_RESULT, fwiw), and none of them results in a nagios.log entry, or an alert being sent, even though they’re all Critical, no volatile checks (snmp traps).
In any case, restarting nagios fixes the problem, but that’s not a fix at all.
Has anyone experienced something like this? Any clues as to where I might look to resolve this?