SMTP Check

Hello All -
I’ve been using the ‘check_smtp’ command to monitor our mail servers which has worked great. However, with increased volume on the mail servers, I somtimes get “false positives” if you will with this default check.

Let me explain, the ‘check_smtp’ query just tries to communicate with the smtp port to see if it is accessible. If it doesn’t get a response in 10 seconds, it will timeout and therefore generate a CRITICAL alert. That is fine and well except now that our volume has increased, sometimes, on occasion, the port will be really busy and timeout a few times throughout the day.

My question is, can I configure the ‘check_smtp’ script to generate to Nagios the output from the command as opposed to this:
CRITICAL - Socket timeout after 10 seconds

What I would like, is for Nagios to let me know if their are “Too many connections” on a check and then if it happens three times in succession, then it may report as being Critical.

I know that the scripts can be altered, but will Nagios allow for the output to be sent to the interface or will it only report the type as ‘UNKNOWN’,‘OK’,‘CRITICAL’,‘WARNING’?

If anyone can think of a better resolution for this, I am open for suggestions. Thanks in advance. 

Can’t you add the output of the command to the current output?
Meaning something like:
CRITICAL - “output_string(s)”

Nagios can handle this. I believe it will display anything you give on your standard output. You can even add HTML code if you want.

i’m using 60 secodns timeout on high load mail servers.
On the machine itself i have an snmp check counting the actual established SMTP connections on the machine and reporting them to nagios with a quite high critical warning level. The results are graphed via nagiostat to see where the peaks lie and to see if effectively the warning and critical thresholds are ok.

Luca

Luca,
Did you write your own script for that SMTP check or did you just edit the original? Can you send me the script that you use? Also, I haven’t utilized nagiostats yet. Is that something easy to setup?

So you have setup the check_smtp script on your mail server and report back to nagios the actual connections and then have it graphed? How does that work exactly? That sounds like something I would like to do here.

Thanks in advance for your assistance. :wink:

check_smtp hasn’t been modified. i only added the timeout parameter to have different values for some servers which sometimes take a while to respond.

the smtp connections is a script on the machine… .
netstat -a | grep smtp | grep ESTABLISHED | wc -l
something like this to count the numbner of active connections.

IN the exec part of the snmpd.conf i inserted this script associated to an OID and calling the OID gives back the output of the script. Sometimes a bit slow responding but using it not very often can help. (Be sure not to use a random OID… check for free OID ranges - usually in the enterpirse range you should find something, here somebody requested an enterprise ID so we are using that range)

Nagiostat looks quite complex but if you aren’t afraid of regexes it’s no real big deal… just like nagios read the docs well and slowly and after reading it once try to graph one value… once you get how to create an RRD archive for your needs it’s no big deal… having used MRTG helps a bit. (There are some nice rrd tutorails which explain well heartbeat, step and so on)

Luca
Edited Thu Dec 22 2005, 09:46AM ]

so you have the script running on your mail server to check the current active connections, and you have that script report back to nagiostats?

I can create the script and have it report back every 5 minutes or whatever, but how do I feed that info into nagiostats on the nagios server?

Also, are you graphing this using MRTG or RRDTool? I’ve been learning a little about RRDTool and I know how to create very basic RRD Archives.

If you are using MRTG for the graphing, what does your MRTG config look like for that?

Thanks!

He uses nagiostat. Download and install it. It comes with a REAME that tells you all that you need to get it working with nagios. In other words, he doesn’t use rrdtool directly. It’s nagiostat that uses rrdtool, and mrtg is not involved at all and is not needed, since it’s nagiostat that is creating all the pretty graph’s via rrdtool and all the other rrdgraph etc commands.
Edited Fri Dec 23 2005, 09:19PM ]

jakkedup is right. The reporting is done via SNMP.
You can associate scripts with user defined OIDs so when a particular OID is asked for the snmpd process runs the script and returns the result via SNMP.

Luca

Hi!!
Just curious, where do you add the timeout value, I tried in host or services and it doesn’t work.
Please help.
Thanks