Nagiostat creating rrd databases, but not creating graphics


#1

I have installed the latest version of RRDTool and Nagiostat on a fresh Nagios 2.3 install. Nagios is running fine. I ran the perl scripts that came RRDTool and they created the appropriate graphics. After installing Nagiostat, I have defined a host in my config files. I am getting an RRD file in my /Archive folder. But when I attempt to view the collected data via a browser, I get the correct page but it is missing the graphs themselves. Just Red X’s in IE. Below are snippets from my configs:

serviceextinfo.cfg:

‘HOU-ADM-XP-01’ Service Extended Information

define serviceextinfo{
host_name ADM-XP-01
service_description Ping
notes_url /nagiostat/nagiostat.cgi?graph_name=ADM-XP-01_Ping
icon_image graph.gif
icon_image_alt ADM-XP-01 Ping Graph
}

nagiostat.conf:

RRDArchiveFile RRDCreateTemplate HostRegex ServiceRegex ValueRegexTemplate

InsertValue ADM-XP-01_ping.rrd ping_1min /ADM-XP-01/ /Ping/ ping_rta_pktloss

GRAPHNAME RRDFILENAME GraphTimeTemplate PlotTemplate HTML-Template Title

Graph ADM-XP-01_Ping ADM-XP-01_Ping.rrd std_1year ping_rta default.html “Ping ADM-XP-01 RTA”

Any ideas what I might be missing here?

Thanks!


#2

From another thread:
meulie.net/portal_plugins/fo … c.php?6194

"If converting to V2.x from V1.x or even simply attempting to use nagiostat(a rrd graphing utility that is NOT included with Nagios. i.e. not to be confused with the new nagiostats utility) there will be many macros that you need to be aware of.
Edit nagiostat file and change all occurances of:
$LASTCHECK to $LASTSERVICECHECK
$OUTPUT to $SERVICEOUTPUT
$PERFDATA to $SERVICEPERFDATA

Any scripts that you made personally, may be affected, so you should read through:
nagios.sourceforge.net/docs/2_0/macros.html"


#3

This is a fresh install, here is the checkcommand I am using, which is generating the missing graphics.

PERF-DATA-HANDLER

define command {
command_name service-perf-data-handler
command_line /usr/local/nagios/nagiostat/nagiostat -p “$LASTSERVICECHECK$|!!|$HOSTNAME$|!!|$SERVICEDESC$|!!|$SERVICESTATE$|!!|$SERVICEOUTPUT$|!!|$SERVICEPERFDATA$”
}

Thanks!


#4

Fresh or not fresh, doesn’t matter, please do as suggested or nagiostat will absolutely not work with 2.x.

I asked you to edit the /usr/local/nagios/nagiostat/nagiostat file, not the services.cfg file.


#5

Sorry about that, It’s usually me giving people a hard time about not reading for content…

OK, I have now edit the /usr/local/nagios/nagiostat/nagiostat file. I have replace the three varibles with the new ones(there were 3 occurances of each one in the nagiostat file).

After restarting Nagios & Apache, I’m still only getting the Red X’s where the graphics should be. !sad

On the other hand that *.rrd files are updating constantly, so that seems to be working.

Thanks for the assistance!


#6

Did you configure the nagiostat.conf file?
HTMLTemplatePath /usr/local/nagios/nagiostat/html-templates
RRDToolPath /usr/local/rrdtool-1.2.12/bin/rrdtool
RRDArchivePath /usr/local/nagios/nagiostat/archives

It sounds like you did, since .rrd files are being updated.
Turn on debug in nagiostat file and make it a 3.
Check the debug log for problems.
Double check your “Graph” section in nagiostat.conf It sounds like you are not getting to the proper webpage Graph.


#7

OK, one step forward one step back.

We have graphs!! But they are empty???

The *.rdd files timestamps are changing, but their size is staying constant.

Weird!


#8

rrd files will always stay the same size. That is how database files work. A DB file is created, but the contents of that file is empty. The DB is then filled up to compacity, and then it’s first in, first out round robin.


#9

I guess I was expecting them to be a bit larger than 116.2 KB.

I have turned on debug with on my nagiostat file. Here is a brief sample from it.

Fri May 12 16:17:09 2006
**INCOMING PERFDATA:
LASTCHECK=1147468624
HOSTNAME=HOU-CLC4000-01
SERVICEDESCR="Ping"
SERVICESTATE=“OK"
OUTPUT=“PING OK - Packet loss = 0%, RTA = 1.34 ms"
PERFDATA=””
+VALUE: 1.34
+VALUE: 0
=INSERT into ‘HOU-CLC4000-01_ping.rrd’: 1.34,0 DSA-names=rta,pktloss
!RRDCMDLINE: /usr/local/rrdtool-1.2.13/bin/rrdtool update /usr/local/nagios/nagiostat/archives/HOU-CLC4000-01_ping.rrd --template rta:Pktloss 1147468624:1.34:0

Fri May 12 16:17:18 2006
**INCOMING PERFDATA:
LASTCHECK=1147468626
HOSTNAME=HOU-NAS-01
SERVICEDESCR="Ping"
SERVICESTATE=“OK"
OUTPUT=“PING OK - Packet loss = 0%, RTA = 0.51 ms"
PERFDATA=””
+VALUE: 0.51
+VALUE: 0
=INSERT into ‘HOU-NAS-01_ping.rrd’: 0.51,0 DSA-names=rta,pktloss
!RRDCMDLINE: /usr/local/rrdtool-1.2.13/bin/rrdtool update /usr/local/nagios/nagiostat/archives/HOU-NAS-01_ping.rrd --template rta:Pktloss 1147468626:0.51:0

Fri May 12 16:17:18 2006
**INCOMING PERFDATA:
LASTCHECK=1147468629
HOSTNAME=HOU-TMG-AP-01
SERVICEDESCR="Ping"
SERVICESTATE=“OK"
OUTPUT=“PING OK - Packet loss = 0%, RTA = 4.89 ms"
PERFDATA=””
+VALUE: 4.89
+VALUE: 0
=INSERT into ‘HOU-TMG-AP-01_ping.rrd’: 4.89,0 DSA-names=rta,pktloss
!RRDCMDLINE: /usr/local/rrdtool-1.2.13/bin/rrdtool update /usr/local/nagios/nagiostat/archives/HOU-TMG-AP-01_ping.rrd --template rta:Pktloss 1147468629:4.89:0

Any other areas I should be looking for the reason for the lack of output?


#10

Should PERFDATA have something between the quotes?


#11

OK, next question/observation.
I just noticed that a few of my graphs have a few pieces of data in them. We talking a singe green pixel scattered about without rhyme or reason. What could be the cause of it randomly writing data to the graph and not at other times?


#12

You don’t need anything showing in perfdata, if you are not sampling that data.
For example, and there are many in the nagiostat.conf file that you may have noticed.
ValueRegexTemplate ftp_response "output:rt:/- ([0-9.]+)/"
The above does not need any perfdata, since it’s not using perfdata for parsing. It’s using the output. And, as shown in your debug, data is in fact being inserted into a rrd table.
i.e!RRDCMDLINE: /usr/local/rrdtool-1.2.13/bin/rrdtool update /usr/local/nagios/nagiostat/archives/HOU-CLC4000-01_ping.rrd --template rta:Pktloss 1147468624:1.34:0
So, just because there is data in it, doesn’t mean you are going to see that data, if you don’t have a graph template setup.
There are examples of this in the .conf file.
#Graph sunet-ping sunet-ping.rrd std_1year ping_rta default.html “Ping ftp.sunet.se RTA”

Perhaps you don’t have the correct permissions on the archive direcotry.


#13

I doubt it’s randomly writing data. What I do suspect is that you have more than one nagios running. Shutdown nagios, and make sure they are all dead with a ps -ef|grep nagios
The other possibility is that one of your InsertValue lines, is inserting data into the wrong graph, due to the lack of quotes or something. I have seen this myself, by looking at the debug log. Data from one service is being inserted into the wrong host, just because of this:
hostA free-Disk
hostB free-Disk
Same service name, different hosts. Why it does this, I don’t know, so I tried using quotes and stuff, until it worked.
Bottom line, look at the debug.log closely, and make sure data is going to where it should.


#14

Jakkedup, I want to thank you for your help. I see you providing a large amount of support to a ton of people. It’s appreciated.

OK, I stopped the nagios service and verified that I only have one running.

For the sake of trouble shooting I have reduced my nagios config files down to one host/service.

I have checked the permissions on my /nagiostat/archives folder, they are as follows:
-owner=nagios
-group=nagios
-permissions=764

nagiostat.conf

Pointer to where the rrd-tool binary is located

RRDToolPath /usr/local/rrdtool-1.2.13/bin/rrdtool

Pointer to where HTML-templates are stored

HTMLTemplatePath /usr/local/nagios/nagiostat/html-templates

Pointer to HTML-template for the index-page (relative to HTMLTemplatePath)

GraphIndexTemplate graphindex.html

Pointer to directory where the RRD-archive-files are stored

RRDArchivePath /usr/local/nagios/nagiostat/archives

How many graphs per HTML-page and which time-periods they should represent

TemplateName Definitions (format=::)

GraphTimeTemplate std_1year -2hour:-2min:“Hourly graph” -30hours:-2min:“Daily graph” -9days:-2min:“Weekly graph” -1month:-2min:“Monthly graph” -1year:-2min:“Yearly graph”

RRDCreateTemplate ping_5min --step 300 DS:rta:GAUGE:600:0:5000 DS:Pktloss:GAUGE:600:0:100 RRA:AVERAGE:0.5:1:396 RRA:AVERAGE:0.5:6:336 RRA:AVERAGE:0.5:24:480 RRA:AVERAGE:0.5:234:480
RRDCreateTemplate ping_1min --step 60 DS:rta:GAUGE:120:0:5000 DS:Pktloss:GAUGE:120:0:100 RRA:AVERAGE:0.5:1:1800 RRA:AVERAGE:0.5:7:1850 RRA:AVERAGE:0.5:24:1860 RRA:AVERAGE:0.5:290:1820
RRDCreateTemplate stdvalue1_5min --step 300 DS:rta:GAUGE:600:0:5000 DS:value:GAUGE:600:0:100 RRA:AVERAGE:0.5:1:396 RRA:AVERAGE:0.5:6:336 RRA:AVERAGE:0.5:24:480 RRA:AVERAGE:0.5:234:480
RRDCreateTemplate ifrate_1min --step 60 DS:in:GAUGE:120:0:100000 DS:out:GAUGE:120:0:100000 RRA:AVERAGE:0.5:1:1800 RRA:AVERAGE:0.5:7:1850 RRA:AVERAGE:0.5:24:1860 RRA:AVERAGE:0.5:290:1820

Template-Name PLOT-PARAMETERS

PlotTemplate ping_rta --start $s --end $e DEF:rta=$f:rta:AVERAGE LINE1:rta#00A000:“Roundtrip average (ms)” HRULE:100#D0D000:“Warning level” HRULE:400#E00000:“Critical level” GPRINT:rta:MAX:“Roundtrip MAX\: %.4lgms” GPRINT:rta:MIN:“Roundtrip MIN\: %.4lgms” GPRINT:rta:AVERAGE:"Roundtrip average\: %.4lgms"
PlotTemplate ping_pktloss --start $s --end $e DEF:Pktloss=$f:Pktloss:AVERAGE LINE1:Pktloss#00A000:“Packetloss (%)” HRULE:20#D0D000:“Warning level” HRULE:50#E00000:“Critical level” GPRINT:Pktloss:MAX:“Pktloss MAX\: %.4lg%%” GPRINT:Pktloss:MIN:“Pktloss MIN\: %.4lg%%” GPRINT:Pktloss:AVERAGE:"Pktloss average\: %.4lg%%"
PlotTemplate ts_sessions --start $s --end $e DEF:value=$f:value:AVERAGE LINE1:value#A00000:“TS Sessions (#)”

PlotTemplate ifrate --start $s --end $e -X 1 DEF:in=$f:in:AVERAGE DEF:out=$f:out:AVERAGE AREA:in#00D000:“Inbound traffic (kbit/s)” LINE1:out#0000A0:“Outbound traffic (kbit/s)”

ValueRegexTemplate ping_rta_pktloss “output:rta:/RTA = ([0-9.]+) ms/” "output:Pktloss:/loss = (\d+)%/"
ValueRegexTemplate generic_int "output:value:/(\d+)/"
ValueRegexTemplate ifrate_in_out “output:in:/\[IN\]=(\d+) kbit/” “output:out:/\[OUT\]=(\d+) kbit/”

RRDArchiveFile RRDCreateTemplate HostRegex ServiceRegex ValueRegexTemplate

InsertValue HOU-NAS-01_ping.rrd ping_1min /HOU-NAS-01/ /PING/ ping_rta_pktloss

GRAPHNAME RRDFILENAME GraphTimeTemplate PlotTemplate HTML-Template Title

Graph HOU-NAS-01_ping HOU-NAS-01_ping.rrd std_1year ping_rta default.html “Ping HOU-NAS-01 RTA”

httpd.conf
Alias /nagiostat/ /usr/local/nagios/nagiostat/
<Directory /usr/local/nagios/nagiostat> AllowOverride AuthConfig
Options +ExecCGI
AddHandler cgi-script cgi pl
Order allow,deny
Allow from all

checkcommands.cfg

PERF-DATA-HANDLER

define command {
command_name service-perf-data-handler
command_line /usr/local/nagios/nagiostat/nagiostat -p “$LASTSERVICECHECK$|!!|$HOSTNAME$|!!|$SERVICEDESC$|!!|$SERVICESTATE$|!!|$SERVICEOUTPUT$|!!|$SERVICEPERFDATA$”
}

hosts.cfg
################## HOU-NAS-01 ######################
define host{
use generic-host
host_name HOU-NAS-01
alias HOU-NAS-01
address 10.1.1.16
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
contact_groups nt-admins
}

‘check_ping’ Service Definition

define service{
use generic-service
host_name HOU-NAS-01
service_description Ping
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups nt-admins
notification_interval 120
notification_period 24x7
notification_options c,r
check_command check_ping!100.0,20%!500.0,60%
}

hostsextinfo.cfg

‘HOU-NAS-01’ Host Extended Information

define hostextinfo{
host_name HOU-NAS-01
notes_url ## http://
icon_image linux40.gif
icon_image_alt HOU-NAS-01
vrml_image linux40.gif
statusmap_image linux40.png
2d_coords 350,350
3d_coords #100.0,50.0,75.0
}

What else should I post to figure this out.


#15

Sample from debug.log
Sun May 14 12:59:48 2006
**INCOMING PERFDATA:
LASTCHECK=1147629578
HOSTNAME=HOU-NAS-01
SERVICEDESCR="PING"
SERVICESTATE=“OK"
OUTPUT=“PING OK - Packet loss = 0%, RTA = 0.32 ms"
PERFDATA=””
+VALUE: 0.32
+VALUE: 0
=INSERT into ‘HOU-NAS-01_ping.rrd’: 0.32,0 DSA-names=rta,pktloss
!RRDCMDLINE: /usr/local/rrdtool-1.2.13/bin/rrdtool update /usr/local/nagios/nagiostat/archives/HOU-NAS-01_ping.rrd --template rta:Pktloss 1147629578:0.32:0

Sun May 14 13:04:48 2006
**INCOMING PERFDATA:
LASTCHECK=1147629878
HOSTNAME=HOU-NAS-01
SERVICEDESCR="PING"
SERVICESTATE=“OK"
OUTPUT=“PING OK - Packet loss = 0%, RTA = 0.34 ms"
PERFDATA=””
+VALUE: 0.34
+VALUE: 0
=INSERT into ‘HOU-NAS-01_ping.rrd’: 0.34,0 DSA-names=rta,pktloss
!RRDCMDLINE: /usr/local/rrdtool-1.2.13/bin/rrdtool update /usr/local/nagios/nagiostat/archives/HOU-NAS-01_ping.rrd --template rta:Pktloss 1147629878:0.34:0

Sun May 14 13:09:48 2006
**INCOMING PERFDATA:
LASTCHECK=1147630178
HOSTNAME=HOU-NAS-01
SERVICEDESCR="PING"
SERVICESTATE=“OK"
OUTPUT=“PING OK - Packet loss = 0%, RTA = 0.32 ms"
PERFDATA=””
+VALUE: 0.32
+VALUE: 0
=INSERT into ‘HOU-NAS-01_ping.rrd’: 0.32,0 DSA-names=rta,pktloss
!RRDCMDLINE: /usr/local/rrdtool-1.2.13/bin/rrdtool update /usr/local/nagios/nagiostat/archives/HOU-NAS-01_ping.rrd --template rta:Pktloss 1147630178:0.32:0

Sun May 14 13:14:48 2006
**INCOMING PERFDATA:
LASTCHECK=1147630478
HOSTNAME=HOU-NAS-01
SERVICEDESCR="PING"
SERVICESTATE=“OK"
OUTPUT=“PING OK - Packet loss = 0%, RTA = 0.33 ms"
PERFDATA=””
+VALUE: 0.33
+VALUE: 0
=INSERT into ‘HOU-NAS-01_ping.rrd’: 0.33,0 DSA-names=rta,pktloss
!RRDCMDLINE: /usr/local/rrdtool-1.2.13/bin/rrdtool update /usr/local/nagios/nagiostat/archives/HOU-NAS-01_ping.rrd --template rta:Pktloss 1147630478:0.33:0

I thought an image of my results might be helpful…maybe not. It’s just weird!!!

http://12.163.202.46/images/Nagios/Nagiostat.jpg


#16

I don’t see how it’s working at all due to this:
define service{
use generic-service
host_name HOU-NAS-01
service_description Ping

InsertValue HOU-NAS-01_ping.rrd ping_1min /HOU-NAS-01/ /PING/ ping_rta_pktloss

Your service description is not the same. What would happen if you had 2 services for that host. One with PING and the other with Ping? Try making them the same, case sensitive.


#17

Would you please provide an example of your service description you would use to graph ping? I am so close I can taste it. I have created a new service with the following:

define service{
use generic-service
host_name HOU-NAS-01
service_description Ping_GRP
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups nt-admins
notification_interval 120
notification_period 24x7
notification_options c,r
check_command service-perf-data-handler
}

But I am getting no output.


#18

Call it anything you like, just make sure the CASE is the same. You can’t define a service with a service_description of Ping, and then in nagiostat you tell it that the service_description is PING. See what I mean? Like I said, I can’t see how it works at all, and that may just be what is wrong.
So just change the nagiostat.conf to Ping, just like the service_description.


#19

In other words,

define service{
use generic-service
host_name HOU-NAS-01
service_description Ping

InsertValue HOU-NAS-01_ping.rrd ping_1min /HOU-NAS-01/ /Ping/ ping_rta_pktloss


#20

If that doesn’t work, then try quotes like this:
InsertValue HOU-NAS-01_ping.rrd ping_1min /“HOU-NAS-01”/ /Ping/ ping_rta_pktloss

It might be puking on those dashes in the hostname, but really, the debug output seems to look just fine. Clear out the debug file, deleted the rrd archive files, and make sure it creates the file by itself, and that the debug data is valid.
If it inserted data for the right host, and right service.