Passive Checks Problems


#1

Some months ago , I 've installed nagios release 3.X
the server is configure only to receive passive checks
I have 4 servers polling services and sending via nsca 2000 checks (in less than 6 minutes) to the nagios core
this works fine but after some hours the performance info window (passive services check counters) say
= 1 minute: 0 (0.0%)
<= 5 minutes: 0 (0%)
<= 15 minutes: 0 (0.0%)
<= 1 hour: 1776 (100.0%)
the only counters that continues increasing are in check Statistics windows counter external commands!!!
but nagios dont update any more.
Nagios still receiven passive checks but seems that externals commands stop working!
The only thing that bring them up again is: stop nagios and nsca listener server, clear /usr/local/nagios/var/checkresult content and restart again!!!
any ideas??
TIA


#2

Recently Its hapen again any idea maybe linux tcp stack config ?


#3

Hey there! Does this happen consistently (ie: a couple hours and you get crap performance after every restart?)
In your nagios.cfg, what did you set “service_perfdata_file_mode” to? You’re going to want it =w or your perfdata file will get so huge that it’s unmanageable. I know there was a bug in 2.6x and prior that switched the =w and =a operations but i dont think this is an issue for 3.x…

other than that i would do a double-check on each configuration option that you need to set for a passive nagios server as specified in the docs (nagios.sourceforge.net/docs/3_0/distributed.html)


#4

my config does not have problems with perfdata or config problem
after some minutes the passive service checks count 0
whe i run strace of nagios the result are strange
this happens with version 3b01, 3b02,3b03 3b04 and this one
when I run a strace to nagios when hangs show something strace: seem that cannot find some temporary archive that already build ?
tia

lose(7) = 0
munmap(0xb74cc000, 4096) = 0
time([1193066278]) = 1193066278
time([1193066278]) = 1193066278
time([1193066278]) = 1193066278
time([1193066278]) = 1193066278
write(6, “1193066171||r083||Uptime||OK - -”…, 82) = 82
unlink("/nagios/tmpfs/checkresults/c9wyQy0") = -1 ENOENT (No such file or directory)
unlink("/nagios/tmpfs/checkresults/c9wyQy0.ok") = -1 ENOENT (No such file or directory)
time([1193066278]) = 1193066278
time([1193066278]) = 1193066278
time([1193066278]) = 1193066278
time([1193066278]) = 1193066278
rt_sigaction(SIGPIPE, {0x701aa0, ], SA_RESTORER, 0x670f48}, {SIG_IGN}, 8) = 0
send(3, “<14>Oct 22 12:17:58 nagios: PASS”…, 232, 0) = 232
rt_sigaction(SIGPIPE, {SIG_IGN}, NULL, 8) = 0
open("/nagios/tmpfs/nagios.log", O_RDWR|O_APPEND|O_CREAT, 0666) = 7
time([1193066279]) = 1193066279
fstat64(7, {st_mode=S_IFREG|0664, st_size=172040, …}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb74cc000
write(7, “[1193066279] PASSIVE SERVICE CHE”…, 217) = 217
close(7) = 0
munmap(0xb74cc000, 4096) = 0
time([1193066279]) = 1193066279
time([1193066279]) = 1193066279
time([1193066279]) = 1193066279
time([1193066279]) = 1193066279
write(6, “1193066171||core1-lan||Port1/2-e”…, 195) = 195
unlink("/nagios/tmpfs/checkresults/c9wyQy0") = -1 ENOENT (No such file or directory)
unlink("/nagios/tmpfs/checkresults/c9wyQy0.ok") = -1 ENOENT (No such file or directory)
time([1193066279]) = 1193066279
time([1193066279]) = 1193066279
time([1193066279]) = 1193066279
time([1193066279]) = 1193066279
rt_sigaction(SIGPIPE, {0x701aa0, ], SA_RESTORER, 0x670f48}, {SIG_IGN}, 8) = 0
send(3, “<14>Oct 22 12:17:59 nagios: PASS”…, 119, 0 <unfinished …>


#5

Perhaps the problem is with this:
in your /etc/xinetd.d/nsca file, you will find :
cps = 9000 30
instances = UNLIMITED
The above is how I have it, but it wasn’t like that by default. What is or may be happening to you, and did to me, was that I was being limitted to much and connections would get rejected, so I had to resaart xinetd periodically. So change your nsca settings to something like the above and maybe that will help.