Monitoring NT Services


#1

I got nagios 2.0 installed on Fedora Core 4. I’m trying to monitor the status of MS SQL services on a Windows 2003 and Windows 2000 server. I have installed NSclient on both servers that I want to monitor. Authentication is setup on my nagios page, the ping test works fine, but the status for SQL just shows Unknown for both servers. When I try the nt_check command from the command line specifying the MSSQLServer service it shows status as started on both servers. Only on the web interface its showing Unknown. Anybody have any ideas???


#2

hello…

here is how i do it, I am using nagios 1.2 on redhat 9 and using nrpe to do my checks.

I have installed nsclient++ (nscplus.sourceforge.net/) on my windows 2000 and 2003 servers. here is a copy of my checkcommands.cfg file:

‘nt_service_sql’ command definition

define command{
command_name nt_service_sql
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c checkServiceState -a ShowAll MSSQLSERVER
}

hope this helps.

g


#3

Thanks for the response. Is anyone doing this with check_nt? check_nrpe has a lot of extra configuration to do on the client side which I was trying to avoid. nsclient looks like it’s working fine, it gives me a status of “started” for MSSQL when I run the check from the command line. The web interface is the only thing that’s showing it as “Unknown” Anyone have any other suggestions of why this would happen?


#4

If the command runs in command line while logged in as root, and doesn’t work when the nagios daemon runs it, then it’s due to:

  1. Incorrect permission’s on the check command so nagios doesn’t even know where the file is.
  2. Nagios doesn’t have the proper permission to access the service you are attempting to contact.

So try this from a command line:
su - nagios
now execute the same command you did as root. Does it work?


#5

Yes, I tried it with the nagios user and it seems to execute just fine, I thought you might be right about that but the results are below:

[nagios@localhost libexec]$ ./check_nt -H 192.168.x.x -p 1248 -v SERVICESTATE -d SHOWALL -l MSSQLSERVER
MSSQLSERVER: Started


#6

You need to show us your checkcomannds.cfg definition plus what is $USER1$ and what is the service.cfg definition for the check. It’s most like a syntax problem.


#7

Here are my config files as requested:

checkcommands.cfg

################################################################################

COMMAND DEFINITIONS

SYNTAX:

define command{

template

name

command_name

command_line

}

WHERE:

= object name of another command definition that should be

used as a template for this definition (optional)

= object name of command definition, referenced by other

command definitions that use it as a template (optional)

= name of the command, as recognized/used by Nagios

= command line

################################################################################

‘check_ping’ command definition

define command{
command_name check_ping
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
}

‘check_nt’ command definition

define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v SERVICESTATE SHOWALL -l $ARG1$
}

‘check-host-alive’ command definition

define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1
}

‘check_sql’ command definition

define command{
command_name check_sql
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v SERVICESTATE SHOWALL -l MSSQLSERVER
}

services.cfg*

Generic service definition template

define service{
name generic-service ; The ‘name’ of this service template, referenced in other service definitions
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness’
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts

register			0	; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}

Service definition

define service{
use generic-service ; Name of service template to use

hostgroup_name			servers
service_description		PING
is_volatile			0
check_period			24x7
max_check_attempts		3
normal_check_interval		5
retry_check_interval		1
contact_groups			support
notification_interval		120
notification_period		24x7
notification_options		w,u,c,r
check_command			check_ping!100.0,20%!500.0,60%
}

define service{
use generic-service ; Name of service template to use

host_name			wsrv1
service_description		MSSQLServer
is_volatile			0
check_period			24x7
max_check_attempts		3
normal_check_interval		5
retry_check_interval		1
contact_groups			support
notification_interval		120
notification_period		24x7
notification_options		w,u,c,r
check_command			check_sql
}

define service{
use generic-service ; Name of service template to use

host_name			wsrv2
service_description		MSSQLServer
is_volatile			0
check_period			24x7
max_check_attempts		3
normal_check_interval		5
retry_check_interval		1
contact_groups			support
notification_interval		120
notification_period		24x7
notification_options		w,u,c,r
check_command			check_sql
}

define service{
use generic-service ; Name of service template to use

host_name			wsrv3
service_description		MSSQLServer
is_volatile			0
check_period			24x7
max_check_attempts		3
normal_check_interval		5
retry_check_interval		1
contact_groups			support
notification_interval		120
notification_period		24x7
notification_options		w,u,c,r
check_command			check_sql
}

hosts.cfg

Generic host definition template

define host{
name generic-host ; The name of this host template - referenced in other host definitions, used for template recursion/resolution
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts

register			0	; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}

‘wsrv1’ host definition

define host{
use generic-host ; Name of host template to use

host_name		wsrv1
alias			wsrv1
address			192.168.x.x
contact_groups		support
check_command		check-host-alive
max_check_attempts	10
checks_enabled		1
notification_interval	120
notification_period	24x7
notification_options	d,u,r
}

‘wsrv2’ host definition

define host{
use generic-host ; Name of host template to use

host_name		wsrv2
alias			wsrv2
address			192.168.x.x
contact_groups		support
check_command		check-host-alive
max_check_attempts	10
checks_enabled		1
notification_interval	120
notification_period	24x7
notification_options	d,u,r
}

‘wsrv3’ host definition

define host{
use generic-host ; Name of host template to use

host_name		wsrv3
alias			wsrv3
address			192.168.x.x
contact_groups		support
check_command		check-host-alive
max_check_attempts	10
checks_enabled		1
notification_interval	120
notification_period	24x7
notification_options	d,u,r
}

resource.cfg

$USER1$=/usr/local/nagios/libexec


#8

[quote=“consoul”]Yes, I tried it with the nagios user and it seems to execute just fine, I thought you might be right about that but the results are below:

[nagios@localhost libexec]$ ./check_nt -H 192.168.x.x -p 1248 -v SERVICESTATE -d SHOWALL -l MSSQLSERVER
MSSQLSERVER: Started
[/quote]

According to the quote above, you didn’t run the same command in the command line, that you just now posted from your checkcommands.cfg file. So try it.
[nagios@localhost libexec]$ ./check_nt -H 192.168.x.x -p 1248 -v SERVICESTATE SHOWALL -l MSSQLSERVER

It’s not going to work. You forgot the -d


#9

Wow, funny how much of a difference one character makes. Works like a charm!

Thanks for your help jakkedup! You da man!