NSClient++ and concurent call


#1

Hi there,

We are using NSClient++ in two different ways :

  • First one is the normal way, as a Nagios Client. NSClientpp is called by Nagios for performing several checks. Results of these checks are reported into Nagios web GUI.
  • Second is as a remote execution backdoor, allowing us to perform remote batch execution on windows host, from a scheduler installed on linux host. The scheduler is installed on same server than Nagios server, then it could use check_nrpe command to launch batch on remote host as external scripts.

Our problem is, when a batch to execute (second usage described above) takes some times, other checks performed in the same time by Nagios server (first usage described above) are not possible. It seems that, during the execution of a “check” (in fact, one of our external script), NSClientpp is unable to handle any other requests.

This is a big problem, because some of remote batches we have to execute may takes hours to complete, we then reveive alerts for nagios checks (because NSClientpp is not available). Our understanding of this situation is that NSClientpp is not multithread/multiprocess, and does not fork when receiving a request, making the parent process available to handle other request (like any daemon server should do).

Please be notice that our problem has nothing to deal with “timeout” settings, which are properly adjusted to our need, on both configuration sides (nagios sserver and nsclientpp ini file).

Question are :

  • Can anyone confirm that NSClientpp behaviour is not allowing such concurent call ?
  • Does anyone can suggest a solution ?

We have read that using NSClientpp should be deprecated, and NC_net is supposed to be better. Beside of that we haven’t found so much documentation about feature of that client, and we are not able to find out if this client will handle concurent calls properly. Since we have “some” windows servers monitored by Nagios, we don’t want to deploy NC_net and replace NSClientpp before being sure of the reliability of it… So if anyone have info regarding this, he/she would be welcome :slight_smile:

Thanks in advance for any of your updates !

Valentin


#2

have you tried installing two NSCLient daemons on different ports?


#3

NSClient++ should not have any trouble with concurrent calls.
It is multithreaded and each socket request will be handled as a separate thread (this depends a bit on version of course).
Some plugins are not however so running multiple calls to them at the same time will cause the other thread to wait until completion of the first. But as I understand it you are using the proxy NRPEClient feature?

Now something to understand is that NSClient++ was not designed to service many requests simultaneously so the socket handling is a bit naive in that matter. Mainly the problem is that each request spawns a thread and the cleanup cycle (closing resources after a request) is single threaded this is something which will be fixed in the 0.4.x branch. But this means if you have a NSClient++ servicing hundreds of request you will most likely have some issues. But not really for two…

// Michael Medin


#4

[quote=“mickem”]NSClient++ should not have any trouble with concurrent calls.
It is multithreaded and each socket request will be handled as a separate thread (this depends a bit on version of course).
[/quote]

Well, seems to have troubles in fact :wink:

Well, we are using ExternalScript feature of NRPE client. Is that have something to deall with proxy ?

And maybe ExternalScripts feature is one of the one which have such problem… Do you know anything about this ?

[quote=“mickem”]
Now something to understand is that NSClient++ was not designed to service many requests simultaneously so the socket handling is a bit naive in that matter. Mainly the problem is that each request spawns a thread and the cleanup cycle (closing resources after a request) is single threaded this is something which will be fixed in the 0.4.x branch. But this means if you have a NSClient++ servicing hundreds of request you will most likely have some issues. But not really for two…
// Michael Medin[/quote]

Here the problem appears when one “check” is running (since some hours), and another one (regular one) is comming from nagios. So, actually, only two …

To luca : Thanks for your update, but having two instance running on servers is not something acceptable for us.


#5

No, it is more commands like CheckCpu and CheckMem which utilizes a common shared resources which is protected by a mutex.
CheckExternalScripts should not hang other commands.

To be honest I have never actually had commands run for hours that was never really a design goal.
Since there is a timeout which is default around 30/60 seconds or there about so I would think that the command would terminate after that timeout period. Have you changed the timeout? If so to what?

I guess this is something I would have to try myself…

Have you tried running the sample check which “hangs” for 60 seconds and seen if it responds to other checks what that is running?

// Michael Medin