Check host problem


#1

My hosts are not being checked as often as I would like. We had a link failure and I noticed that the new nagios server I am building did not detect it. Then I realised some hosts dont seem to be checked ie:

29-07-2006 19:46:13 0d 3h 5m 1s+ PING OK - Packet loss = 0%, RTA = 0.95 ms

This host hasn’t been checked since July! I have turned on aggressive host checking (use_aggressive_host_checking=1) but this is not making any difference.

I dont want to schedule the check_interval on the host because the docs say this is a bad idea. The only other solution I can think of than this is to allocate a service check PING for all hosts to make sure the hosts are checked enough.

Any other suggestions would be more than welcome!!

Thanks


#2

Think about it. If you have a service check defined for a host, and the service check does not fail, then why even bother doing a host check. The host MUST be up, if the service is ok. So, that is why you do NOT set the check_interval for hosts. If you really really must see that July date change, then force a host check using the web page cgi.
The only time nagios will check a host, is when a service check fails. At that time, nagios stops doing ANYTHING else, and makes the host check.


#3

Ok I will have to add a PING check to the hosts that have no services as these are just routers I want to make sure are up. Thanks


#4

A router usually has multiple interfaces. You really should be checking the status of these ports, and not simply ping just one port.
Use a check_snmp and check the ifOperstatus of each port.


#5

Hmm, thats a bit of a tricky one, Im not allowed to make checks on these router interfaces as they are in France and are checked by the French and their Nagios server, but I need them in mine to diagnose where problems come from on certain links (and for my pretty status map!). Politics…