Parent hosts


#1

Hi everyone,

We have a complex network. I need some guidance on what parent to use. Here is the scenario.

IP addresses mentioned here are just examples:

Nagios machine is at 10.0.0.2
Web server host I want to check is at 192.168.1.10

Traceroute from nagios to webserver

  1. 10.0.0.1
  2. 10.2.0.1
  3. 10.3.0.1
  4. 10.4.0.1
  5. 172.16.0.1
  6. 192.168.1.1
  7. 192.168.1.10 <- webserver

Traceroute from webserver to nagios

  1. 192.168.1.2
  2. 192.168.2.1
  3. 192.168.3.1
  4. 172.16.0.2
  5. 172.16.10.1
  6. 10.4.0.2
  7. 10.3.0.2
  8. 10.7.0.2
  9. 10.2.0.2
  10. 10.0.0.1
  11. 10.0.0.2 <- nagios box

If you will see, we are using dynamic routing protocols. How are we to monitor the webserver with parents supports?

Any help will be greatly appreciated.

Thanks,

V1rt


#2

Looks like you have lots of work to do. You only need to worry about the route from nagios to the webserver. Everything inbetween nagios and the webserver should be checked if at all possible. If there are multiple routes to the host all of them need to be defined in the dependency tree (ie parent/child relationships). Doing this will allow you to find out if the webserver is down or if the connection to the webserver (ie routers, switches,etc.) has problems.


#3

alaster is correct. If the network is your responsibility, which is the case with me, along with the http server, then you need to map out everything. See my post here to get an idea of what I’m talking about.
meulie.net/forum_viewtopic.php?21.1402


#4

Ok. Assuming we have this kind of traceroute from nagios to the host I am checking:

  1. 10.0.0.1
  2. 10.2.0.1
  3. 10.3.0.1 <- load balanced between 10.3.0.5
  4. 10.4.0.1
  5. 172.16.0.1
  6. 192.168.1.1
  7. 192.168.1.10 <- webserver

and

  1. 10.0.0.1
  2. 10.2.0.1
  3. 10.3.0.5 <- load balanced between 10.3.0.1
  4. 10.4.0.1
  5. 172.16.0.1
  6. 192.168.1.1
  7. 192.168.1.10 <- webserver

Link is load balanced. If I recall correctly, there can only be one parent for each child host. There are times that when I do a traceroute, my packets are rerouted to another path. How are we going to address this scenario?

Thanks.

V1rt


#5

Not correct! A host can have many parents or just one parent. There may come a point in your network layout that you have a circular path from nagios’ point of view. For example:
router1 connects to Router2, switch1 and switch2. Router2 connects to router1, switch1 and switch2. Your packets may only flow through router1 to switch1 to a host on that switch, due to the router, but from nagios’ point of view, you have a circular network and it doesn’t like that. So youi can’t specify the parent/child relationship precisely as the network is laid out. At some point, you just reverse the relationship, and then nagios’ is cool with that.

So in your case, it’s precisely what we have here in our company. Many paths to the same host from nagios. As a matter of fact, we have 3 paths here at some points.

First, we must decide if you need to be concerned about those routers in between nagios and the host. Are you responsible for those routers? If not, then if they are broken, there isn’t anything you can do about it anyway, so why bother to monitor them? Example:
Lets say we have nagios setup to monitor www.google.com for httpd. Simple task, but if I also include the routers between us, if they break, I just have to call my ISP and they are going to say the problem is with ATT routers and the net will be back online when they get done. So why bother to ping the routers? Simply ping google.com, and query the httpd port, and if the ping fails then it’s the ISP’s problem to fix that.

But if the routers are your responsibility, then you need to complete what I have outlined in the URL I posted above. i.e. Map out on paper, every single cable connection from your nagios pc all the way to the host you are monitoring. Every port, even the eth0 on your nagios pc, will become a host(along with a service check for ifoperstatus using check_snmp). When you are done, your status map will now look like a network diagram showing that nagios-eth0 is plugged into switch1-port3, switch1-port23 is trunked to router1-port1 and switch1-port24 is trunked to router2-port1. Now we really don’t care which path your packets take, using router1 or router2, but we do care about the cable plugged into router1&2-port1. If a snmp query of the ifoperstatus of those ports is down, then the cable is cut/broken or the router port is defective.

If you are unclear as to what I’m suggesting, then please ask. But as stated, you need to figure out, whether these routers are your job, or ATT’s/ISP’s/or someone else.
Edited Fri Jun 03 2005, 05:49AM ]