Hi,
I have configured Nagios web interface on a server called ZITA.
I have 2 servers that I want to monitor : wsphotonicsA and wsphotonicsB.
In the web interface, the status of both servers is shown with all green. If I shutdown one of the servers, this is correctly shown.
However, the load of both servers is always shown as zero and Nagios never detects the number of logged in users (it always shows zero, with the exception of 1 user that is sporadically detected). The number of processes is detected correctly.
Serverload has been constantly 50% or more during the past 2 weeks, but Nagios doesn’t detect it.
Extract from nagios.log :
[1260918000] CURRENT SERVICE STATE: wsphotonicsA;Current Users;OK;HARD;1;USERS O
K - 0 users currently logged in
[1260918000] CURRENT SERVICE STATE: wsphotonicsA;PING;OK;HARD;1;PING OK - Packet
loss = 0%, RTA = 0.10 ms
[1260918000] CURRENT SERVICE STATE: wsphotonicsA;Root Partition;OK;HARD;1;DISK O
K - free space: / 34103 MB (71% inode=99%):
[1260918000] CURRENT SERVICE STATE: wsphotonicsA;SSH;OK;HARD;1;SSH OK - OpenSSH_
4.3 (protocol 2.0)
[1260918000] CURRENT SERVICE STATE: wsphotonicsA;Swap Usage;OK;HARD;1;SWAP OK -
92% free (875 MB out of 956 MB)
[1260918000] CURRENT SERVICE STATE: wsphotonicsA;Total Processes;OK;HARD;1;PROCS
OK: 21 processes with STATE = RSZDT
[1260918000] CURRENT SERVICE STATE: wsphotonicsB;Current Load;OK;HARD;1;OK - loa
d average: 0.00, 0.00, 0.00
[1260918000] CURRENT SERVICE STATE: wsphotonicsB;Current Users;OK;HARD;1;USERS O
K - 0 users currently logged in
[1260918000] CURRENT SERVICE STATE: wsphotonicsB;PING;OK;HARD;1;PING OK - Packet
loss = 0%, RTA = 0.11 ms
[1260918000] CURRENT SERVICE STATE: wsphotonicsB;Root Partition;OK;HARD;1;DISK O
K - free space: / 34103 MB (71% inode=99%):
[1260918000] CURRENT SERVICE STATE: wsphotonicsB;SSH;OK;HARD;1;SSH OK - OpenSSH_
4.3 (protocol 2.0)
Here is the cfg file that I use to configure the servers :
photonics@zita:~$ more /usr/local/nagios/etc/objects/wsphotonics.cfg
define hostgroup {
hostgroup_name calculation_servers
alias CALCULATION SERVERS
members wsphotonicsA, wsphotonicsB
}
define host {
use linux-server
host_name wsphotonicsA
alias wsphotonicsA
address 157.193.172.101
hostgroups calculation_servers
max_check_attempts 5
check_command check-host-alive
contact_groups admins
notification_interval 2
notification_period 24x7
notification_options d,u,r
}
define host {
use linux-server
host_name wsphotonicsB
alias wsphotonicsB
address 157.193.172.188
hostgroups calculation_servers
check_command check-host-alive
max_check_attempts 5
contact_groups admins
notification_interval 2
notification_period 24x7
notification_options d,u,r
}
###############################################################################
###############################################################################
SERVICE DEFINITIONS - wsphotonicsA
###############################################################################
###############################################################################
Define a service to “ping” to wsphotonicsA
define service{
use local-service ; Name of service
template to use
host_name wsphotonicsA
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
Define a service to check the disk space of the root partition
on the local machine. Warning if < 20% free, critical if
< 10% free space on partition.
define service{
use local-service ; Name of service
template to use
host_name wsphotonicsA
service_description Root Partition
check_command check_local_disk!20%!10%!/
}
Define a service to check the number of currently logged in
users on the local machine. Warning if > 20 users, critical
if > 50 users.
define service{
use local-service ; Name of service
template to use
host_name wsphotonicsA
service_description Current Users
check_command check_local_users!20!50
}
Define a service to check the number of currently running procs
on the local machine. Warning if > 250 processes, critical if
> 400 users.
define service{
use local-service ; Name of service
template to use
host_name wsphotonicsA
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}
Define a service to check the load on the local machine.
define service{
use local-service ; Name of service
template to use
host_name wsphotonicsA
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.
0
}
Define a service to check the swap usage the local machine.
Critical if less than 10% of swap is free, warning if less than 20% is free
define service{
use local-service ; Name of service
template to use
host_name wsphotonicsA
service_description Swap Usage
check_command check_local_swap!20!10
}
Define a service to check SSH on the local machine.
define service{
use local-service ; Name of service
template to use
host_name wsphotonicsA
service_description SSH
check_command check_ssh
notifications_enabled 1
}
###############################################################################
###############################################################################
SERVICE DEFINITIONS - wsphotonicsB
###############################################################################
###############################################################################
Define a service to “ping” to wsphotonicsB
define service{
use local-service ; Name of service
template to use
host_name wsphotonicsB
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
Define a service to check the disk space of the root partition
on the local machine. Warning if < 20% free, critical if
< 10% free space on partition.
define service{
use local-service ; Name of service
template to use
host_name wsphotonicsB
service_description Root Partition
check_command check_local_disk!20%!10%!/
}
Define a service to check the number of currently logged in
users on the local machine. Warning if > 20 users, critical
if > 50 users.
define service{
use local-service ; Name of service
template to use
host_name wsphotonicsB
service_description Current Users
check_command check_local_users!20!50
}
Define a service to check the number of currently running procs
on the local machine. Warning if > 250 processes, critical if
> 400 users.
define service{
use local-service ; Name of service
template to use
host_name wsphotonicsB
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}
Define a service to check the load on the local machine.
define service{
use local-service ; Name of service
template to use
host_name wsphotonicsB
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.
0
}
Define a service to check the swap usage the local machine.
Critical if less than 10% of swap is free, warning if less than 20% is free
define service{
use local-service ; Name of service
template to use
host_name wsphotonicsB
service_description Swap Usage
check_command check_local_swap!20!10
}
Define a service to check SSH on the local machine.
define service{
use local-service ; Name of service
template to use
host_name wsphotonicsB
service_description SSH
check_command check_ssh
notifications_enabled 1
}
What could cause the fact that Nagios is not detecting serverload and logged in users? :?
wbr
Emmanuel Lambert