I have a Solaris 10 machine (S10), a Solaris 8 machine (S8), and a RHEL 4 machine (RH4).
Nagios 3 is installed on RH4, Nagios 1 is installed on S8; we are in the process of getting ourselves upgraded (finally) from 1 to 3. The problem I am having is with NRPE daemon on the S10 (v. 1.8) and S8 (v. 1.2.4) machines , interacting with the NRPE client on RH4 (I had to run 1.9 for compatability).
Symptoms are exactly the same, for both the S10 and S8, along with results from all my testing, so I will only present the logs and results from one server. the error I am getting from each:
CHECK_NRPE: Received 0 bytes. Are we allowed to connect to the host?
Simple fix, right--just add the ip address of RH4 to the allowed_hosts line in nrpe.cfg? I did this, however, and started nrpe up with debugging turned on. here's the log entries from S10 :
Jan 12 14:01:34 paris nrpe: [ID 601491 daemon.notice] Starting up daemon
Jan 12 14:01:34 paris nrpe: [ID 624405 daemon.debug] Listening for connections on port 5666
Jan 12 14:01:34 paris nrpe: [ID 907248 daemon.debug] Allowing connections from: 127.0.0.1,172.27.210.65,172.27.162.100
Jan 12 14:01:45 paris nrpe: [ID 654915 daemon.debug] Connection from 172.27.210.65 port 60838
Jan 12 14:01:45 paris nrpe: [ID 797369 daemon.debug] Host address checks out ok
Jan 12 14:01:45 paris nrpe: [ID 879649 daemon.debug] Handling the connection...
Jan 12 14:01:45 paris nrpe: [ID 881351 daemon.debug] Host is asking for command 'check_var' to be run...
.......Command runs and outputs.........
Jan 12 14:01:45 paris nrpe: [ID 903583 daemon.debug] Connection from 172.27.210.65 closed.
Jan 12 14:01:53 paris nrpe: [ID 654915 daemon.debug] Connection from 172.27.162.100 port 43113
Jan 12 14:01:53 paris nrpe: [ID 381997 daemon.error] Host 172.27.162.100 is not allowed to talk to us!
Jan 12 14:01:53 paris nrpe: [ID 903583 daemon.debug] Connection from 172.27.162.100 closed.
172.27.162.100 is RH4, and 172.27.210.65 is S8. the checks on S8 work just fine, and always have. it's the RH4 address that I'm trying to get working. Now, it specifically says, it's allowing connections from 172.27.210.65 as well as 172.27.162.100, yet it says the 172.27.162.100 is not allowed!
I know there's no firewall in place, both because I double checked with our network guys, and because I can see the connection in the nrpe daemon debug log. NRPE is, for some reason, actively rejecting my address, even though it's defined as allowable.
Anyone have any ideas why?
I do notice that the pid field in the logs is incrementing; I wondered if that meant I had another daemon running that I wasn't aware of, that doesn't show up in the ps -ef, and therefore an old daemon with the old rules was responding...I'm not all that certain how solaris 10, svcadm, and various styles of inet work, but when I do svcs and inetadm, I don't see anything in any of the entries that suggest I have an svcs or inet daemon running...i mostly ruled out that option because when I killed the only process that shows in ps -ef, I got connection refused, so I assume there's nothing else running.