Nagios newbie looking for a little help on "CRITICAL -


Hello. I’m having a problem with Nagios. Basically, another fellow set up Nagios to monitor some remote servers, and now that is he no longer here, I’ve been assigned the duty to maintain Nagios. Unfortunately, I’m new to Nagios, Linux, and while I do have some networking experience, I am by no means an expert. So with all this dirty stuff out of the way, here is what’s going on.

About a month ago, a fellow (Andy) here had set up Nagios to monitor a few remote servers. Nagios was set up to monitor five servers. Four of them were remote, and one was a server on our local VLAN. All was well, til this past week, when we noticed that 9 out of 10 services were returning the following alert:

CRITICAL - Socket timeout after 10 seconds

Only the four remote servers were returning this alert. The local server was all green/good. Of the 10 services we were monitoring on the remote servers, the only service that was returning a healthy signal was “Ping”. The remote servers were all up and running fine. I successfully logged into them via Remote Desktop, and all services looked to be running fine in the Task Manager. I just don’t know why I am now recieving this critical alert.

Now one thing may need to add is that between the time that Nagios was sure to be working correctly and the time we realized Nagios was returning these alert, we did switch T1 providers. Previously, we had a T1 provided by XO, and our VLAN sat behind an InGate firewall. Now, our provider is BroadWing, and our VLAN is sitting behind an EdgeMarc firewall. In order for the NSClient to communicate with our Nagios server, do I need to open up some ports on the firewall? Or am I barking up the wrong tree?



Most probably it’s the new firewall… possibly you had some port open on the old firewall… and not on the new one. check on which port nslient runs (not sure about it)… and see if you get it running again.



Well, yeah, basically that message is saying the the plugin is timing out when trying to execute. It’s the same principle behind getting at time out message when trying to ping a machine. The check either isn’t getting through, or it’s getting through but isn’t getting back.

So like Luca said, check your firewall.