Hi all
I’m quite new to nagios (two days now…) and have a couple of (maybe) silly questions. Our environment:
- We’re completely on AIX 5.3 ML03
- Compile and installation of Nagios 2.04b went good.
- Ping service and parent/child relationships have successfully been set up.
-
One thing we are interested in is a consolidated view of permanent hardware errors of our clients. On AIX you use the “errpt” command to get a list of the last error-entries and the “errpt -a” command to get more details. I searched the web and did not find an agent or software capable of sending the latest errpt-entries matching the requirements named above. How do others get hardware errors reported?
-
If we would find something suitable (or write our own) how could correlation be done? We have up to 10-12 micro partitions per IBM Power5 P570, with shared hardware (e.g. network adapter). We would like to receive only one event per hardware-failure. Right now every affected partition will report its own failure. In my understanding Nagios is not capable of doing correlation today so I searched a bit and found SEC (Lightweight Event Correlation). I’m very unsure if it’s the right tool and where to place it. Has it to be on the nagios-server itself (my guess) or on every single client.
The correlation logic would be like follows:
Search for errpt events within the last 10 minutes. For every event do remember the hostname and look up in a table the related p570-system. Send the event only once per p570.
Can SEC handle thatlike stuff? How does the connection to Nagios look like?
Best thanks in advance…Kuffi