Monitoring Exchange 2007 CCR Replication Status


I’ve just spent a solid 4 hours getting this to work, so I thought I’d share…

Monitoring Exchange 2007 CCR Replication Status with Nagios, NSClient++, Powershell, Blood, Sweat & Tears

We have a pair of Exchange 2007 CCR Clusters and wanted a way to monitor the Replication status of the storage groups. Microsoft, in their infinte wisdom, have removed WMI support for monitoring Exchange 2007, preferring instead to make you use Powershell. So, for this project you will need:
[list]]Nagios/:m]]Check_NRPE/:m]]NSClient++ 0.3.5/:m][/list:u]
Configure Nagios

Make sure you’ve got the Check_NRPE plugin in your libexec folder then add a new command definition to the commands.cfg like so:

define command{ command_name check_exrep command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -u -t 120 -p 5666 -c check_exch }Then setup service definitions and hosts/hostgroups as you would normally.

Configure NSClient++

In your [NSClient++ Folder]\Scripts folder, create a new powershell script file (.ps1) called “exrep.ps1” and put the following code inside:

[code]$X = @(Get-StorageGroupCopyStatus | Select Identity,SummaryCopyStatus | Select-String -pattern “Suspended”,“Failed”,“Initializing”)

If (!$X) {write-host “OK”;exit 0}
else {
foreach($value in $X){
write-host $value
exit 1}[/code]
You can remove any item from the list of patterns if you don’t want to be alerted for it.

Microsoft strike again with Powershell; if you just try and execute it remotely then it hangs, so you have to fudge it. Still in the Scripts folder, create a new batch file called “ex.bat” and put the following code inside:

[code]@echo off

cmd /c echo . | C:\Windows\System32\WindowsPowerShell\v1.0\PowerShell.exe -PSConsoleFile “C:\Program Files\Microsoft\Exchange Server\Bin\ExShell.psc1” -nologo -noprofile -noninteractive -Command “[NSClient++ Folder]\Scripts\exrep.ps1”[/code]
Next, open up your NSC.ini file and uncomment the “CheckExternalScripts.dll” line. In the [External Scripts] section, create a new entry for “check_exch” that points to “scripts\ex.bat”. You may also have to up the timeout in the [External Script] section as I found one of my Exchange servers took over 60 seconds to respond to the command - doubtless another Powershell “feature” :frowning:

Finally, restart the NSClient++ service on the client machine and restart Nagios on the server. When your check next runs you should get an “OK” output. If any of your storage groups are not in a “Healthy” state your should get an output that looks like: [StorageGroup Name]; SummaryCopyStatus=[Status].

The only thing I haven’t managed yet is to get it to report as a Critical event instead of a Warning, but I think that’s a limitation of NSClient++'s External Command handling so I’ll have to keep working on that.