I have a host that is a member of a hostgroup. All the members of this hostgroup monitor some simple services like check_nt_cpuload, check_nt_uptime, etc… They all report fine except for one server. One server has these services going up and down randomly with “Socket Timeout” in the status information. I cannot figure out what is causing this issue. I’ve tried pinging the host from an ssh connection and it reports back fine with standard times (~ .200 ms). Any thoughts on this problem would be greatly appreciated.
I think it may be a network card issue, will confirm once I know for sure.
Grab a tool like Matt’s Traceroute (mtr) and mtr x.x.x.x (replace x’s with the problem hosts IP). See if you’ve got packet loss! typically a bouncy service is caused if the service takes more than 10 seconds to run occaisionally, but if it’s just the ping host check then your problem is likely not with nagios!
Turns out it was a problem with the network card. Thanks for the response.