Nagios Current Load -Critical


#1

I’m receiving messages from Nagios (more and more frequently) about this but don’t know how to diagnose the problem (if there really is one) - can anyone shed light as to what the numbers below mean?

Thanks,
sg

Current Status:
CRITICAL
Status Information: CRITICAL - load average: 6.52, 5.61, 4.93
Performance Data: load1=6.520;5.000;10.000;0; load5=5.610;4.000;6.000;0; load15=4.930;3.000;4.000;0;
Current Attempt: 4/4
State Type: HARD
Last Check Type: ACTIVE
Last Check Time: 02-15-2007 14:53:55
Status Data Age: 0d 0h 2m 11s
Next Scheduled Active Check: 02-15-2007 14:58:55
Latency: 0.064 seconds
Check Duration: 0.006 seconds
Last State Change: 02-15-2007 14:38:55
Current State Duration: 0d 0h 17m 11s
Last Service Notification: 02-15-2007 14:39:00
Current Notification Number: 4
Is This Service Flapping? N/A
Percent State Change: N/A
In Scheduled Downtime?
NO
Last Update: 02-15-2007 14:56:00

Host Sort by host name (ascending)Sort by host name (descending) Service Sort by service name (ascending)Sort by service name (descending) Status Sort by service status (ascending)Sort by service status (descending) Last Check Sort by last check time (ascending)Sort by last check time (descending) Duration Sort by state duration (ascending)Sort by state duration time (descending) Attempt Sort by current attempt (ascending)Sort by current attempt (descending) Status Information
sg02

Current Load

CRITICAL 	02-15-2007 14:53:55 	0d 0h 18m 16s 	4/4 	CRITICAL - load average: 6.52, 5.61, 4.93

#2

Also receive messages about processes - said critical 270 processes were running.


#3

Not much to diagnose really. Whatever box that has that load on it, has to much to do. You are overworking it.


#4

I’m going to take a wild guess that this is the nagios box itself. If so, then I bet you are using nrpe and/or active checks mostly. If so, then reduce the number of active checks running by setting up a distributed server setup.