I found something that’s looks like a bug.
All this starts with the upgrade from nagios 3.0.5 to 3.0.6. it has some security bug fixes so the was a nessesery upgrade.
After the successful upgrade i always preform a test to see if the alerting is still working.
I don`t know no other way than to change a not existing IP address of a switch in the config file (Of course is not realy an inportant switch)
After a wile it detecting that the switch is not available anymore. It takes 10 min. And than its start to alerting.
For alerting i use sms_client. The following log in nagios shows up:
[02-09-2009 13:44:30] Warning: Contact ‘nagiosadmin’ host notification command ‘/bin/sms_client -q kpn:06xxxxxxxx “VDXgro PROBLEM: SWITCH01 is DOWN:”’ timed out after 30 seconds
[02-09-2009 13:43:59] HOST NOTIFICATION: nagiosadmin;SWITCH01;DOWN;host-notify-by-pager;CRITICAL - Host Unreachable (10.xxx.xxx.xxx)
[02-09-2009 13:43:59] HOST ALERT: SWITCH01;DOWN;HARD;10;CRITICAL - Host Unreachable (10.xxx.xxx.xxx)
In the sms_client log it shows this
Feb 09 13:43:59  : Dialing SMSC 06xxxxxxxx…
Feb 09 13:44:04  WARNING: read() Timeout
Feb 09 13:44:26  : Connection Established.
That it. no more than this. It does not hangup the line. Nothing happens. I need to reset the modem to get it working again.
When i send a messages via the the Linux command line, like this: it all works fine.
sms_client kpn:06xxxxxxxx "test"
And the log of sms_client:
Feb 10 10:50:23  :  kpn:06xxxxxxxx "test"
Feb 10 10:50:23  : Dialing SMSC 0653xxxxxx…
Feb 10 10:50:28  WARNING: read() Timeout
Feb 10 10:50:51  : Connection Established.
Feb 10 10:50:57  : Hangup…
Feb 10 10:51:00  : kpn Service Time: 37 Seconds
I tested all kinds of thing like changing the config file (COMMANDS.CFG)
Than i decided to rollback the version from 3.0.6 to 3.0.5 again.
My problem was solved. This is the log:
[02-10-2009 12:27:50] HOST NOTIFICATION: nagiosadmin;SWITCH01;DOWN;host-notify-by-pager;CRITICAL - Host Unreachable (10.xxx.xxx.xxx)
Feb 10 12:27:50  : Dialing SMSC 0653xxxxxx…
Feb 10 12:27:55  WARNING: read() Timeout
Feb 10 12:28:17  : Connection Established.
Feb 10 12:28:23  : Hangup…
Feb 10 12:28:26  : kpn Service Time: 36 Seconds
Feb 10 12:28:26  :  kpn:06xxxxxxxx "VDXGRO PROBLEM: SWITCH01 is DOWN: $"
Feb 10 12:28:26  : Total Elapsed Time: 36 Seconds
My best guess is the there may be something wrong with the execution command…
Does anyone has the same problem.