I'm quite new to nagios (two days now...) and have a couple of (maybe) silly questions. Our environment:
- We're completely on AIX 5.3 ML03
- Compile and installation of Nagios 2.04b went good.
- Ping service and parent/child relationships have successfully been set up.
One thing we are interested in is a consolidated view of permanent hardware errors of our clients. On AIX you use the "errpt" command to get a list of the last error-entries and the "errpt -a" command to get more details. I searched the web and did not find an agent or software capable of sending the latest errpt-entries matching the requirements named above. How do others get hardware errors reported?
If we would find something suitable (or write our own) how could correlation be done? We have up to 10-12 micro partitions per IBM Power5 P570, with shared hardware (e.g. network adapter). We would like to receive only one event per hardware-failure. Right now every affected partition will report its own failure. In my understanding Nagios is not capable of doing correlation today so I searched a bit and found SEC (Lightweight Event Correlation). I'm very unsure if it's the right tool and where to place it. Has it to be on the nagios-server itself (my guess) or on every single client.
The correlation logic would be like follows:
Search for errpt events within the last 10 minutes. For every event do remember the hostname and look up in a table the related p570-system. Send the event only once per p570.
Can SEC handle thatlike stuff? How does the connection to Nagios look like?
Best thanks in advance...Kuffi