Hello,
I have installed nagios on a redhat server. I have added another server from the same network to the hosts/services/… files in order to monitor it as a test. I’m using only a check_ping command right now to see if I got the configuration correct. I’ve shut down the test server (the one that doesn’t run nagios) to test if my configurations were correct. Nagios realized that the server went down immediately, but when I powered that test server up again it takes about 20 minutes or so for it to recognize thats back up again. I put the normal_check_interval = 1 so it would recheck the status within a minute, but unfortunately it takes a lot longer than that. Is there other places that I should modify time intervals in in order to get faster responses (which is never the case when something goes down, but is the case when something goes up again).
Thank you very much.
In nagios.cfg there is interval_length= which defines what an interval is in seconds. Leave at default for easy math in the rest of your configs. Also in nagios.cfg if you have aggregate_status_updates=1 then the following is the number of seconds status is updated:
status_update_interval= Mine is 15 which should be fine.
service_reaper_frequency=10 is the number of seconds Nagios will process the results of services that have been checked.
In the cgi.cfg:
refresh_rate=300 is the number of seconds that your browser will automatically do a refresh. So, without this, if you where looking at the service problems webpage, then it would never refresh unless you did it yourself with a f5.
In services.cfg normal_check_interval=5 or 1 like you have it, should be fine. But I’d suggest to make it 5. Then set your max_check_attempts=3 and your retry_check_interval=1. So, when a service check fails, it will try for 3 more times. Each retry will occur every 1 interval(60 seconds). After 3 retries, the state will be determined to be a HARD state and notifications will go out. After that, the check will occur again in 5 intervals and so on.
Your description of your trouble could be that your interval length is not set right, or you have max_check_attempts to high, or retry_check_interval too high. Go with my numbers or the defaults from the sample.cfg files.
you’re right, I found out that I messed with the time period interval.
Thank you very much