Relax Nagios!


#1

Hi,
:!: We don’t know why but Nagios is paranoid. We’re receiving 10 HARD CRITICAL notifications per day… but there is no problem on the server when we telnet. It seems like Nagios Alerts when service is unavailable on FIRST attempt. I always see 1/4 in the notification report, never 4/4. We would like Nagios to relax before sending a notification because it cannot connect to a port.

:?: Is it possible for Nagios to be on alert when an issue is encountered, but not necessarily print an alert notification?

:arrow: For us, after a connection problem, Nagios should enter in “verification” status and attempt 4 times in the next 30 seconds before printing a notification alert?

:mrgreen: cause really, I like to sleep at night!
thanks!


#2

take a look at your templates.cfg file.
For whatever “service” is alerting you right away, is it set to
use generic-service or no?

the template.cfg defines generic-service and defaults for exactly what you want IE, try 3 times before alerting people etc.


#3

thanks for your reply enigma!

:arrow: The commands.cfg is now set in the following manner for the services that Nagios gets paranoid on:

[code]# ‘check_http’ command definition
define command{
name check_http
command_name check_http
command_line /usr/local/nagios/libexec/check_http -H localhost -t 180 -p 80
}

‘check_ssh’ command definition

define command{
command_name check_ssh
command_line /usr/local/nagios/libexec/check_ssh -H localhost -t 180 -p 22
}[/code]

:arrow: While the others are pretty much set like this:

# 'check_smtp' command definition define command{ command_name check_smtp command_line $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$ }

:?:
thanks in advance!


#4

I think you are confused with what should be edited in nagios.
You should leave the command definition how it was, something like:

# 'check_http' command definition define command{ command_name check_http command_line $USER1$/check_http $ARG1$ }

The $ARG1$ means that when you create a checkmyhost.cfg file any arguments to pass to what you want to check goes to it.
So in your checkmyhost.cfg file (name it whatever you want, just make sure nagios.cfg points to it by cfg_dir or by cfg_file)
you then would define a service

define service{ use generic-service ; Name of service template to use host_name myhost service_description my localhost http site on port 80 check_command check_http!-H localhost -e "HTTP/1.1" contacts nagiosadmin }

I think your problem is you weren’t even using a defined service, you were modifying the definition of check_http so it would alert you immediatley. Create a checkmyhost.cfg or whatever and define a service to check for it per my example above and you’ll be fine. oh and fix your defintions to use the variables again instead of hard coded values.