Nagios Disk Space Monitor: Can it show the *amount* of free


#1

Greetings y’all. I’m a serious Nagios rookie.

I have Nagios up and running. I’m currently monitoring three servers. I’m getting alerts that the amount of free disk space (on Windows boxes) is low. Is there a way to configure the alerts to show me the amount (instead of just “Service: Disk Usage on D Drive”)? Or even the percent free/used?


#2

Hi

You need to make use of the some or more of the following macros in your alert notifications…
$HOSTOUTPUT$
$LONGHOSTOUTPUT$
$SERVICEOUTPUT$
$LONGSERVICEOUTPUT$
If you take a look at nagios.sourceforge.net/docs/3_0/pluginapi.html it will help explain better exactly which portion of your plugin output is refered to by what macro.

HTH

/S


#3

Thanks, Strides. I think I’m on to it now. I appreciate the help.


#4

Well I thought I was on to it. Can you clarify a couple things for me?

I have a copy of host-notify-by-email, and it works. What do I need to do to my check_nt_disk_C so it uses those parameters? Am I making sense? I’m losing something somewhere.

Do I need to create a new service check?
What’s the connection between the two? In-other-words, how does check_nt_disk_C call (if that’s the correct term) host-notify-by-email?


#5

Hi

Your check should already provide that info, if you run it from the command line you should see something like…

[root@localhost libexec]#./check_nt -H 10.240.7.145 -p ***** -s ***** -v USEDDISKSPACE -l c -w 80 -c 90 c:\ - total: 4.01 Gb - used: 2.92 Gb (73%) - free 1.09 Gb (27%) | 'c:\ Used Space'=2.92Gb;3.21;3.61;0.00;4.01
…and you should also be seeing "c: - total: 4.01 Gb - used: 2.92 Gb (73%) - free 1.09 Gb (27%) " in your ‘status information’ field for the service check in the GUI… if you don’t, then you probably need to change the check you are using. If all is well, it’s just a question of getting "c: - total: 4.01 Gb - used: 2.92 Gb (73%) - free 1.09 Gb (27%) " into the alert email, and if you refer to the example on the previously provided link on performance data you can see that the example output above is pretty similar to that in “Plugin Output Examples: Case 2 One line of output (text and perfdata)” which would lead to that string being known to nagios as the macro $SERVICEOUTPUT$.

So, in order to get that detail in an alert email you need to add that macro into your “service-notify-by-email” command object somewhere… probably at the end. I have to say though that my notify-service-by-email command object already has it in, and the fact that you refer to “host-notify-by-email” makes me wonder if you are talking about v2 rather than v3 as the v3 version of the command object is named slightly different, “notify-host-by-email” - although as far as I can make out, $SERVICEOUTPUT$ hsould exist just fine in v2 as it was $LONGSERVICEOUTPUT$ that was addin in v3 for multi-line support. This is what mine looks like anyway…

# 'notify-service-by-email' command definition define command{ command_name notify-service-by-email command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$ }

Though as I say, if your notify object already has $SERVICEOUTPUT$ in there and the problem lies with your check command not returning the data back to nagios in the first place then there’s nothing to be done before sorting that out and changing the way you perform the check

HTH

/S


#6

Hi Strides,

Now I’m strating to get lost. In addition to being a Nagios rookie, i’m a Linux rookie, too.

I’m testing/running/configuring everything through Groundwork. Am I barking up the wrong tree by posting here?

Having said that, let me explain how i’m testing. In the GW front-end, I go to Configuration >> Services >> Service Check. From there, I click Check Command and select Check_nt_useddiskspace. Below there is the command definition, usage, command line and test. I don’t mean to insult your intelligence; I just want to make clear what i’m doing.

So if I’m reading your previous post correctly, would I put the command object at the end of “Command line”? like this: check_nt_useddiskspace!c!90!95
/usr/bin/printf “%b” “***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $LONGHOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n” | /bin/mail -s “Host $HOSTSTATE$ alert for $HOSTNAME$!” -email-?


#7

If I run your command from the command line, I do get C:\ - total: 30.00 Gb - Used 9.99 Gb (33%) - free 20.00 Gb (67%) | 'C:\ Used Space '=9.99Gb;23.99;26.99;0.0

What’s this telling me? Am I configured correctly?


#8

[blockquote]I’m testing/running/configuring everything through Groundwork. Am I barking up the wrong tree by posting here? [/blockquote]

Yeah, that puts a different spin on things. I’m sorry but I’m not familiar with how groundworks overlays nagios and therefore cannot tell you how to modify the alert notificaions if it does it in a different way to nagios.

However, having said that, there must be a degree of similarity in the underpinning infrastructure if you have such things as a host-notify-by-email command - do you have a service-notify-by-email command too I wonder?

Under a ‘normal’ nagios architecture, when you set a check to run you can also define a contact to whom the alert will go. Then, in the contact definition, where you specify the contact’s email or pager number or whatever, you define the type of notification to be used by referencing the appropriate command object, which for email is nominally somthing like “service-notify-by-email” - it’s that notification command object we would normally modify to include whatever detail we want in the notification, like the addition of $SERVICEOUTPUT$ macros or whatever. A lot of checks normally return more than just an “Everythings OK” type phrase, some including performance data and other additional information, but the key thing the check returns with is an “exit code” to determine whether it finished in a good, warning or critical state. Nagios calls the check, and depending on if the outcome indicated in the exit code will determine if the service was down, may then proceed to call the command object to notify the contact of the issue… i.e. Nagios runs the check, the check comes back “screwed”, nagios thinks ‘I better let the contact know…’ and looks what contact is defined for this service check, then looks at how the contact wants to be notified, and runs the appropriate command to make that happen.

I hope that goes some way to explaining things a bit better, at least in the case of ‘vanilla’ nagios. As I say, I suspect Groundwork overlays a bunch of stuff to make your life “easier” but it is perhaps likely that the underpinning infrastructure is similar. If you can go into the ‘guts’ of the program files and see if you can find something that looks like config files, usually these will end with a .cfg extension, you may find something which contains the paramaters that determine which check details (contained in macros) are emailed out. Then maybe there’s a chance you can modify them to something more like what you need.

A word of caution of course, never make changes without first backing things up, and if you still don’t see anything obvious or are uncomfortable with changing something then it’s probably best to try and locate a dedicated groundworks forum, or shout back here for some help by someone who knows more about that particular infrastructure.

All the best

/S


#9

Thanks, Strides.

There is a service-notify-by-email command. I may have been looking at the wrong ‘notify’ macro. I was able to edit the HOST-notify-by-email command and get it to send alerts. After playing around with that for a while, I realized that host means (duh) host. So while it was sending alerts to me, it was the host status (host up or host down). I started messing with SERVICE-notify… and almost got it working when I decided to quit for the night. I’m reasonably sure I’m now on the right track.

Thanks for all the help. I surely do appreciate it, Stride!


#10

:smiley:


#11

I’m striking out on the Groundwork forum. I hope this is a question someone here can answer.

Here’s the scoop: I’m using Groundwork on top of Nagios. I’m getting alerts; that’s not the problem. I’d like for Nagios to tell me how much used/free space i have in the body of the alert.

I’ve tweaked a number of the commands in numerous ways to the point I’m getting lost/(more) confused. I just can’t seem accomplish what I’m looking for. I can either get a generic alert telling me “hey, you’re low on disk space”, but not how low. Or, I can test a command and get what I’m looking for, but it doesn’t e-mail me.

I’ve tried to create a new “notify-by-email” using the following command line:
/usr/bin/printf “%b” “***** Nagios ***\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /bin/mail -s " $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **” -email-. I get get the following output:

/usr/bin/printf “%b” “***** Nagios ***\n\nNotification Type: -\n\nService: service_desc\nHost: localhost\nAddress: localhost\nState: UP\n\nDate/Time: 2008-10-06 15:43:09\n\nAdditional Info:\n\n-" | /bin/mail -s " - alert - localhost/service_desc is UP **” -
You must specify direct recipients with -s, -c, or -b…
According to the NSClient doc, -s = password, -c = critical percent, and there’s nothing for -b. Have I satisfied the -c with “c -95”? I don’t think I need the -s, or do I? Am I looking in the wrong place for the switches?

I’m not sure what the -s, -c or -b switches do or where I put them in the command. Any ideas? If I run this from a command prompt, it works: [root@localhost libexec]# ./check_nt -H 192.168.1.170 -v USEDDISKSPACE -l C -w 80 -c 90 I get the following results:
c:\ - total: 74.50 Gb - used: 69.72 Gb (94%) - free 4.78 Gb (6%) | ‘C:\ Used Space’=69.72Gb;59.60;67.05;0.00;74.50"

Here’s an example of where I can get what I’m looking for, but no alert:
Command line
$USER1$/check_nt -H wks146 -v USEDDISKSPACE -l C w -90 c -95 “** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/Diskspace is low " -email-
Test
/usr/local/groundwork/nagios/libexec/check_nt -H wks146 -v USEDDISKSPACE -l C w -90 c -95 "
- alert - Wks146/Diskspace is low **” -email-
C:\ - total: 74.50 Gb - used: 69.72 Gb (94%) - free 4.78 Gb (6%) | ‘C:\ Used Space’=69.72Gb;0.00;0.00;0.00;74.50
What’s the difference between a command and a macro and a plugin?
Do I need to have anything in the arguments box? If so, what?
All I need is to have this “C:\ - total: 74.50 Gb - used: 69.72 Gb (94%) - free 4.78 Gb (6%) | ‘C:\ Used Space’=69.72Gb;0.00;0.00;0.00;74.50” in the body of the alert.

Do I need to create a new service or command? What the difference between the two?

Perhaps I’m getting lost/confused in the command line syntax or the Arguments syntax. Can someone shed some light on that for me (in layman’s terms because I’m a Nagios and Linux rookie)? It might be best (at least for me) to take one thing at a time.

If I’m going in circles, please let me know. I’ll see if I can clarify. I think I’m going in circles because I’m chasing my tail. I hope I’ve not repeated myself too much or provided too much for one to make sense of it all.

Many, many thanks!!


#12

Hi again
forgive me for asking but at the end of the notify command
[blockquote]I’ve tried to create a new “notify-by-email” using the following command line:
/usr/bin/printf “%b” "***** Nagios *\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /bin/mail -s " $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ " -email-. I get get the following output:[/blockquote]
where you indicate
-email-
, is that actually what it says or something you have overtyped for purposes of hiding your email address? I only ask as I thought normally this forum automagically hyperlinks anything it reads as an email address, like nobody@microsoft.com, but it doesn’t appear to have done so in this case. Also the error “You must specify direct recipients with -s, -c, or -b…” is an error from /bin/mail saying it cant find a valid reciepient address… If -email- is actually the content of the notify command, then you might try changing it to the macro that nagios uses to select the contacts email address from the contact object, which is $CONTACTEMAIL$

It might help if you post your original service-notify-by-email command object up here so we can see what might be missing and compare how it is constructed with the more familiar default nagios one…

I’ll try and help with some of the other questions…
[blockquote]
/usr/local/groundwork/nagios/libexec/check_nt -H wks146 -v USEDDISKSPACE -l C w -90 c -95 “** - alert - Wks146/Diskspace is low " -email-[/blockquote]
Putting “”
$NOTIFICATIONTYPE$ alert - $HOSTALIAS$/Diskspace is low " -email-" on the end here won’t make it email you, the check plugin is only doing the check…
[blockquote]What’s the difference between a command and a macro and a plugin?[/blockquote]
A (check) plugin is an external (to nagios) program that is run by and interacts with nagios. So, you configure a service check on a host to check, say, drive C disk space (or whatever) on some windows server, and what thresholds you want it to alert… Nagios itself doesn’t do the check, it passes on all the required information to the plugin. In this case, that plugin is called check_nt. Nagios tells it such things as the server hostname or IP address, what drive you want to check, and whatever warning and critical thresholds you specify. The plugin then runs on the windows server and examines the disk space, then the plugin’s internal logic ascertains if the results have broken any of your prescribed thresholds - this determines how the plugin exits by altering the exit-code as appropriate.
Then, the plugin goes back to the nagios server, and in this case passes out a long string of information, like c:\ - total: 74.50 Gb - used: 69.72 Gb (94%) - free 4.78 Gb (6%) | ‘C:\ Used Space’=69.72Gb;59.60;67.05;0.00;74.50"… Each time this data is returned, Nagios stores the first part of this information in a MACRO, $SERVICEOUTPUT$. Other parts of the string or multiline output may be stored in other macros, like $LONGSERVICEOUTPUT$ or $SERVICEPERFDATA$. Then, the plugin’s exit code is examined and the state of the service is ascertained, be it OK, WARNING, CRITICAL or UNKNOWN, and this is stored in another macro, $SERVICESTATE$. So, think of macros as variables in programing, that at any one time may contain some current information pertaining to your last service check, and indeed there are also other many other macros detailing information on the host or service being checked and so on…
So next, if Nagios determines that if a warning or critical state has been reached on this check, (depending on the configuration it may go on to retry the check several times to make sure the service is indeed down, but for purposes of simplification lets assume that this has already happened), it’ll then go on to notify whatever contact as configured for this service check. The contact will have an email address and a particular “notify” type command associated to it, so nagios will then run this ‘notify’ command… Like the plugin, the command that is run will be something external to nagios itself, and for flexibility, could do anything… send email, pager, sms messages, re-write an html web page, and so on. You don’t wan’t to be going and writing a specific email for nagios to send each time your C drive has a problem on server X, then one for your D drive on server X, then 2 more emails for server Y and so on, you’d quickly become overrun with notification emails. So, the notify command again makes use of those macros which are storing the information on the service, check state, host and whatnot, and you just write the one command. To use the previous example, I’ll highlight all the nagios macros in bold
[blockquote]# ‘notify-service-by-email’ command definition
define command{
command_name notify-service-by-email
command_line /usr/bin/printf “%b” "
*** Nagios \n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /bin/mail -s "* $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **” $CONTACTEMAIL$
}[/blockquote]
Essentially it’s a big long one-liner email notification command. First we have /usr/bin/printf “%b” which is a standard *nix command. This pipes everything (up to the pipe symbol, “|”) through to /bin/mail, and means “print whatever follows as an argument in string format” (sorry, I can’t think of any easier way to write that - basically it is chucking the body of the email at the mail program as one big string. It also interperets ‘escaped’ characters i.e. it replaces \n with a return/new-line character). As it does this, it is replacing every single macro with it’s current real content, i.e. $NOTIFICATIONTYPE$ becomes ‘Alert’ and $SERVICESTATE$ becomes ‘Warning’ or ‘Critical’ etc. And, /bin/mail is being run with the -s flag, which specifies whatever follows that (the bit in the quotes) to become the ‘subject’ of the email, likewise, the macros here are replaced by real current data. The last macro, $CONTACTEMAIL$, is important, as this tells /bin/mail who it needs to send all this to, and will be overwritten by Nagios with the email address for your appropriate contact for this check.

Hopefully, that should shed some more light on it, at least I can only hope it has not served to make you any more confused.

Regards

/S


#13

Well hello to you, too.

Since my last post, I’ve managed to make some progress. To answer your questions, though, -email- is just something I put there for hiding my eddress. And yes, it does put a fake eddress where you put your real eddress. At the time, I didn’t know that.
I have changed $CONTACTEMAIL$ to my eddress.

I think this might be the original notify-by-email. I’m having server trouble, and I can’t get to it (yet).
/usr/bin/printf “%b” “***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $LONGHOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n” | /bin/mail -s “Host $HOSTSTATE$ alert for $HOSTNAME$!” $CONTACTEMAIL$
I’ve tried a hundred different ways, so I don’t remember what all I changed. I’m pretty sure I tweaked things between the $ $s.

As I said, i have made some progress. I can test a command and get it to e-mail me. That’s the good side. The bad side is that it generates errors, and my Nagios box isn’t sending alerts.

Here’s what I have for disk space:
$USER1$/check_nt -H $HOSTADDRESS$ -p $USER19$ -v USEDDISKSPACE -l C w -90 c -95 | /bin/mail -s “USEDDISKSPACE alert for $HOSTNAME$!” smcpherson@e-chx.com
Here’s the error:
Error(s) executing /usr/local/groundwork/nagios/libexec/check_nt -H wks146 -p 1248 -v USEDDISKSPACE -l C w -90 c -95 | /bin/mail -s “USEDDISKSPACE alert for wks146!” smcpherson@e-chx.com

I’ve also tried this (and it works without errors, but it doesn’t generate an e-mail):
$USER1$/check_nt -H wks146 -p $USER19$ -v USEDDISKSPACE -l C w -90 c -95 | /bin/mail -s “USEDDISKSPACE alert for $HOSTNAME$!” smcpherson@e-chx.com

If I could get **one **of these to work properly, I might be able to figure out the rest.

Thanks, also, for the clarification of checks/commands/macros, too! It’s all starting to become (somewhat) clear.

Instead of being a rookie, maybe one day I’ll make it to “The Show.” :smiley:


#14

[blockquote]To answer your questions, though, -email- is just something I put there for hiding my eddress.[/blockquote]Super, just thought I’d check !shy Whether changing $CONTACTEMAIL$ to a real address is or isn’t possibly causing a problem I don’t know, can’t say I’ve ever done it… the ‘at’ symbol might throw a spanner in the works or it might not. I guess you’d know whether or not it still worked after you change that one thing though :slight_smile:

[blockquote]I think this might be the original notify-by-email. I’m having server trouble, and I can’t get to it (yet).
/usr/bin/printf “%b” “***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $LONGHOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n” | /bin/mail -s “Host $HOSTSTATE$ alert for $HOSTNAME$!” $CONTACTEMAIL$ [/blockquote]That looks like the host notify command to me, which’ll be sent if the whole host goes down, it’s the service-notify-by-email command that deals with the service check side, that’s the one we need to take a look at (when you can get on the server again)

[blockquote]$USER1$/check_nt -H $HOSTADDRESS$ -p $USER19$ -v USEDDISKSPACE -l C w -90 c -95 | /bin/mail -s “USEDDISKSPACE alert for $HOSTNAME$!” -email-[/blockquote]Cool that it generates an email, though I’m a little surprised it works especially as it even complains about errors, anyway that’s by the by. What happens here is the above service check gets run for your wks146 server by nagios, which is populating the appropriate macros with information you configured for that wks146 server, i.e. $HOSTADDRESS$ gets replaced with the IP address of wks146 and $USER19$ I assume is configured as your serverside client’s port… so it runs that, gets the output back, then pumps the whole lot at /bin/mail which emails it to you. So far, so groovy. The reason why you are not getting alerts I imagine is that the /bin/mail part of the process at the end, which is by all accounts working and sending the email, is finishing ‘cleanly’, thus it’s exit code is a normal happy exit code. I reckon Nagios sees this exit code from /bin/mail and thinks that the service check is OK, regardless of what the plugin exitcode was or might be, and hence OK means no alerts.

So on to
[blockquote]I’ve also tried this (and it works without errors, but it doesn’t generate an e-mail):
$USER1$/check_nt -H wks146 -p $USER19$ -v USEDDISKSPACE -l C w -90 c -95 | /bin/mail -s “USEDDISKSPACE alert for $HOSTNAME$!” -email-[/blockquote]
Hmmm, not sure what’s going on here… depends on the definition of ‘works’… what i’d first imagined, is that nagios can’t resolve wks146 as a host IP address, and thus is timing out and passing nothing whatsoever in the way of an email body to /bin/mail which might be saying ‘yeah OK, i’ll send nothing’ and exit with what nagios sees as an OK status. On reflection I would have thought that /bin/mail would complain about it a bit more than that and at least exit with some sort of error code. Really can’t fathom that one and I’m not near any flavour of linux at this time to try it and see…

As far as getting something working goes, you should be just fine with

That should be OK and as far as sending alerts goes, the last exit code nagios will see is the one from the plugin, and then nagios will/should use the service-notify-by-email command to mail out the alert just like it used to… Granted, it may well come out without any of the stuff you want, but that will need to be fixed**(another thought occurred, see footnote)* in the editing of the service-notify-by-email command object which with any luck will solve the problem, assuming groundworks nagios uses the same $SERVICEOUTPUT$ macro that normal nagios does, and I can’t see why it wouldn’t.

[blockquote]Thanks, also, for the clarification of checks/commands/macros, too! It’s all starting to become (somewhat) clear.
Instead of being a rookie, maybe one day I’ll make it to “The Show.” [/blockquote]
No worries. Now your looking ‘under the hood’ it should all start dropping into place and I’m sure it won’t be long berfore your back at the groundworks forum soving everyone elses problems for them :smiley:

But for now I think I’ve gone on long enough, Im off to bed now, it’s late o’clock.

Toodles

/S

*****The check_nt plugin definately comes back with the right data, we already seen this by running it from the command line, so the question is why it wasn’t getting through to you on your original alert emails…
We’ve been looking at the possibility that the service-notify-by-email command is at fault, but it is possible that the original check_nt_disk_C command was configured to use some hooky plugin wrapper that didn’t use check_nt like we did from the command line and was sending back only an exit code with no data. I think this is most unlikely but might be a possibility, not knowning exactly what it did. If suddenly using that check_nt command above you start getting back the data, then that means that this is in fact the reason all along and there is nothing up with service-notify-by-email, we just needed to change the check to pure check_nt… As I say, unlikely, but I just realised that one thing I never asked though is whether you are running any other checks that do come back with the plugin output in an alert email (like packetloss details and RTA for check_ping)? Probably should have, might have saved a lot of time. So, do you, and do they?