check_nt - check for multiple instances of a process?

merickson · July 24, 2009, 4:31pm

I recently set up Nagios and NSClient++ to monitor a network with about 25 Windows servers, and it has been absolutely wonderful.

One of the servers in question runs a very poorly written commercial non-free software product. In the interest of protecting the guilty, let’s call this product “OfficeMatch” for now.

OfficeMatch normally runs a process that we’ll call “match.exe” as a local user that we’ll call “wetadmin”. When the OfficMatch malfunctions – which, unfortunately, happens often* – OfficeMatch somehow manages to spawn a second instance of “match.exe” which runs under user “SYSTEM”. The two “match.exe” processes apparently end up fighting over a serial port that the system uses to record call data from the PBX, and the end result is that the server fails to do its job. However, OfficeMatch is not polite enough to notify us that it is malfunctioning; instead, it simply collects incomplete data and generates reports that have little no relationship with reality.

So, does anyone know of a simple way to use Check_nt and NSClient++ (or any other combination of Nagios plugins and clients) to check for multiple instances of a specific process on a Windows server? Of course, we can and do currently use check_nt to make sure that match.exe is running at all, but is there a way to make sure that it’s running only once?

Thanks!

Miles

(and we are powerless to fix it ourselves because it is not open-source software)

merickson · July 29, 2009, 5:38pm

Figured it out. It’s like this, using check_nrpe to “CheckProcState” via NSClient++ with MaxCritCount=2:

# 'check_officematch' command definition define command { command_name check_officematch command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c checkProcState -a ShowAll MaxCritCount=2 match.exe }

Note that when MaxCritCount = 2, that means anything >= 2 is critical. (The documentation for CheckProcState indicates that >n is critical, but it’s actually >=n.)