Hi,
I found something that’s looks like a bug.
All this starts with the upgrade from nagios 3.0.5 to 3.0.6. it has some security bug fixes so the was a nessesery upgrade.
After the successful upgrade i always preform a test to see if the alerting is still working.
I don`t know no other way than to change a not existing IP address of a switch in the config file (Of course is not realy an inportant switch)
After a wile it detecting that the switch is not available anymore. It takes 10 min. And than its start to alerting.
For alerting i use sms_client. The following log in nagios shows up:
[02-09-2009 13:44:30] Warning: Contact ‘nagiosadmin’ host notification command ‘/bin/sms_client -q kpn:06xxxxxxxx “VDXgro PROBLEM: SWITCH01 is DOWN:”’ timed out after 30 seconds
[02-09-2009 13:43:59] HOST NOTIFICATION: nagiosadmin;SWITCH01;DOWN;host-notify-by-pager;CRITICAL - Host Unreachable (10.xxx.xxx.xxx)
[02-09-2009 13:43:59] HOST ALERT: SWITCH01;DOWN;HARD;10;CRITICAL - Host Unreachable (10.xxx.xxx.xxx)
In the sms_client log it shows this
Feb 09 13:43:59 [19656] : Dialing SMSC 06xxxxxxxx…
Feb 09 13:44:04 [19656] WARNING: read() Timeout
Feb 09 13:44:26 [19656] : Connection Established.
That it. no more than this. It does not hangup the line. Nothing happens. I need to reset the modem to get it working again.
When i send a messages via the the Linux command line, like this: it all works fine.
sms_client kpn:06xxxxxxxx "test"
And the log of sms_client:
Feb 10 10:50:23 [10353] : [000] kpn:06xxxxxxxx "test"
Feb 10 10:50:23 [10353] : Dialing SMSC 0653xxxxxx…
Feb 10 10:50:28 [10353] WARNING: read() Timeout
Feb 10 10:50:51 [10353] : Connection Established.
Feb 10 10:50:57 [10353] : Hangup…
Feb 10 10:51:00 [10353] : kpn Service Time: 37 Seconds
I tested all kinds of thing like changing the config file (COMMANDS.CFG)
Nothing works.
Than i decided to rollback the version from 3.0.6 to 3.0.5 again.
My problem was solved. This is the log:
[02-10-2009 12:27:50] HOST NOTIFICATION: nagiosadmin;SWITCH01;DOWN;host-notify-by-pager;CRITICAL - Host Unreachable (10.xxx.xxx.xxx)
Sms_client log:
Feb 10 12:27:50 [20455] : Dialing SMSC 0653xxxxxx…
Feb 10 12:27:55 [20455] WARNING: read() Timeout
Feb 10 12:28:17 [20455] : Connection Established.
Feb 10 12:28:23 [20455] : Hangup…
Feb 10 12:28:26 [20455] : kpn Service Time: 36 Seconds
Feb 10 12:28:26 [20455] : [000] kpn:06xxxxxxxx "VDXGRO PROBLEM: SWITCH01 is DOWN: $"
Feb 10 12:28:26 [20454] : Total Elapsed Time: 36 Seconds
My best guess is the there may be something wrong with the execution command…
Does anyone has the same problem.