Server Load Question

My company has really gotten into Nagios in a big way. We are rolling out service monitors for >750 hosts. I will end up with a minimum of 4 service checks per host, so I am looking at > 2700 processes on a 15 minute cycle.

I am running Nagios 3 in a Debian VM in Xenserver. It is the only VM on a server with 8 cores and 72 Gb RAM. It is a RAID 5 box with 6 15000 RPM drives. In other words a reasonably good sized Dell R710. I am rolling this out now and I am up to about 1200 services right now with no apparent impact to the box. My average execution time is 4.496 seconds against 2700 active service checks in a 15 min period. Is there a metric I can watch on the server that will tell me how much headroom I have? If I fill up this server as planned am I going to have performance problems? Would I be better off with two servers? I had presented this project as requiring two servers to minimize traffic between cages in our DC - we have two locations with ~375 machines per cage. The cages are connected by a fiber link but we already pass quite a bit of data across that link, and the majority of it is more important than monitoring data. I am leaning toward another server but need some metric based arguements for management. Any help is greatly appreciated.


using two servers you’l still have traffic from one side to the other, one server will make the checks and send the results to the second one where they are listed as passive checks, so i’m not sure how much could be gained by this in terms of bandwidth savings.

Out of my head the numebrs you have shouldn’t get you into trouble… but you’ll need to search a bit, there are a couple fo threads talking about big numbers on the servers.

Not a real answer… sorry.