Howto disable nagios alerts on maintenance time


#1

Hi guys,

I have nagios working in my office. Problem is if I make any changes in config and restart the service I get sms alerts. All I wanted to know is there any good way by taking all nagios config in maintenance mode so I shouldnt get any sms alert for that time?

Today I replace the sms entry in contacts.cfg with a temp email id.

define contact{
contact_name ABCSMS
alias ABCSMS
service_notification_period workhours
host_notification_period workhours
service_notification_options c,r
host_notification_options d,r
service_notification_commands notify-by-sms
host_notification_commands host-notify-by-sms

email 07---------.nqsgq----@24xgateway.com

email abc@officedomain.com
}
After making changes I restarted the nagios service and tested every thing was fine. but when I put the sms gateway back in contacts.cfg and restarted the nagios service I got 1000 sms again … As restarting nagios send alerts that these services/hosts are down …

If we make any changes in contacts.cfg do we have to restart the nagios service?

Many thanks in advance.

Sam


#2

A normal rule is that when you make modifications to the Nagios configuration files you have to restart Nagios. When you make modifications to the objects files, you just have to reload Nagios.

When I reload nagios it finds like unable to read any host and all services are down and then it throws huge number of SMS alert to me. All I can’t understand if I run /etc/init.d/nagios reload then it shouldnt behave like this. I mean just read the new changes in config but shouldn’t break the existing sessions.

Any help?


#3

See the reload module in /etc/init.d/nagios script

reload|force-reload)
printf “Running configuration check…”
$NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
if $? -eq 0 ]; then
echo "done"
if test ! -f $NagiosRunFile; then
$0 start
else
NagiosPID=head -n 1 $NagiosRunFile
if status_nagios > /dev/null; then
printf "Reloading nagios configuration…"
killproc_nagios nagios -HUP
echo "done"
else
$0 stop
$0 start
fi
fi
else
#$NagiosBin -v $NagiosCfgFile
echo " FAILED! Reload aborted. Check your Nagios configuration."
exit 1
fi


#4

Is this a good way to use a timeperiod like

‘nonworkhours’ timeperiod definition

define timeperiod{
timeperiod_name nonworkhours
alias Non-Work Hours
sunday 00:00-24:00
monday 00:00-09:00,17:30-24:00
tuesday 00:00-09:00,17:30-24:00
wednesday 00:00-09:00,17:30-24:00
thursday 00:00-09:00,17:30-24:00
friday 00:00-09:00,17:30-24:00
saturday 00:00-24:00
}

change the values in services.cfg and hosts.cfg of timeperiod 24x7 to Non-workhours? and when every thing is done I should change it to 24x7?

but again in this case when I will put back 24x7 i ll have to reload the config and then will get all the sms alerts :frowning:

Any help?


#5

just a side note. do NOT use restart, use reload, or stop followed by start.


#6

Yeh thats what I tried (/etc/init.d/nagios reload) but it broke all the connections and then I had to restart the service.
Any other comment?


#7

check that data and status retention are enabled in nagios.cfg


#8

Can you please point out what exactly i have to check in nagios.cfg?

Here are the nagios.cfg file variable values…

STATUS FILE

This is where the current status of all monitored services and

hosts is stored. Its contents are read and processed by the CGIs.

The contents of the status file are deleted every time Nagios

restarts.

status_file=/var/log/nagios/status.dat

AGGREGATED STATUS UPDATES

This option determines whether or not Nagios will

aggregate updates of host, service, and program status

data. Normally, status data is updated immediately when

a change occurs. This can result in high CPU loads if

you are monitoring a lot of services. If you want Nagios

to only refresh status data every few seconds, disable

this option.

Values: 1 = enable aggregate updates, 0 = disable aggregate updates

aggregate_status_updates=1

AGGREGATED STATUS UPDATE INTERVAL

Combined with the aggregate_status_updates option,

this option determines the frequency (in seconds!) that

Nagios will periodically dump program, host, and

service status data. If you are not using aggregated

status data updates, this option has no effect.

status_update_interval=15

TIMEOUT VALUES

These options control how much time Nagios will allow various

types of commands to execute before killing them off. Options

are available for controlling maximum time allotted for

service checks, host checks, event handlers, notifications, the

ocsp command, and performance data commands. All values are in

seconds.

service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5

RETAIN STATE INFORMATION

This setting determines whether or not Nagios will save state

information for services and hosts before it shuts down. Upon

startup Nagios will reload all saved service and host state

information before starting to monitor. This is useful for

maintaining long-term data on state statistics, etc, but will

slow Nagios down a bit when it (re)starts. Since its only

a one-time penalty, I think its well worth the additional

startup delay.

retain_state_information=1

STATE RETENTION FILE

This is the file that Nagios should use to store host and

service state information before it shuts down. The state

information in this file is also read immediately prior to

starting to monitor the network when Nagios is restarted.

This file is used only if the preserve_state_information

variable is set to 1.

state_retention_file=/var/log/nagios/retention.dat

RETENTION DATA UPDATE INTERVAL

This setting determines how often (in minutes) that Nagios

will automatically save retention data during normal operation.

If you set this value to 0, Nagios will not save retention

data at regular interval, but it will still save retention

data before shutting down or restarting. If you have disabled

state retention, this option has no effect.

retention_update_interval=60

USE RETAINED PROGRAM STATE

This setting determines whether or not Nagios will set

program status variables based on the values saved in the

retention file. If you want to use retained program status

information, set this value to 1. If not, set this value

to 0.

use_retained_program_state=1

USE RETAINED SCHEDULING INFO

This setting determines whether or not Nagios will retain

the scheduling info (next check time) for hosts and services

based on the values saved in the retention file. If you

If you want to use retained scheduling info, set this

value to 1. If not, set this value to 0.

use_retained_scheduling_info=0


#9

it shouldn’t be sending out alerts on restarts…


#10

but I get the alerts on both cases either reload the service or restart it. Where did you get to know it shouldnt send alert on restart?
Many thanks for reply.


#11

You are right it shouldnt but in my case I don’t know what else is wrong then

Format: retain_state_information=<0/1>
Example: retain_state_information=1
This option determines whether or not Nagios will retain state information for hosts and services between program restarts. If you enable this option, you should supply a value for the state_retention_file variable. When enabled, Nagios will save all state information for hosts and service before it shuts down (or restarts) and will read in previously saved state information when it starts up again.

0 = Don’t retain state information (default)
1 = Retain state information