Confused Development

I develop software and I often get confused in the process. I usually find the answers after a while, but a month later I can't remember them. So from now on, I will write them down here.

Friday, January 23, 2009

Creating Bar Charts with Gnuplot

As part of the work on my thesis, I'm currently analysing lots of log files that show the access patterns to the Semantic Web Dogfood linked data server. At the end, I want to auto-generate nice-looking bar charts from them which show usage over time. Having generate lots of csv files (using a combination of Webalizer, standard Unix commands like sed and grep, and finally some Ruby scripting), my first idea was to do this with Excel. However, while it is certainly possible and easy enough to generate one chart with Excel, I couldn't see an easy way to do it in batches (maybe with AppleScript?). So I decided to use gnuplot, which, while it has a higher learning curve, is so much easier to configure and automate.

Installation was easy enough on Mac OS 10.5. I tried a MacPorts port first, but unfortunately that just generate lots of errors during the installation process. As a last resort, I tried installing gnuplot from source (I'm always a little scared of doing that...) - and it worked like a breeze! I had expected it to be the other way around…

Anyway, after going through some short tutorials and the gnuplot documentation, I finally came up with this plot to generate the bar charts, which reads its data from a 'numbers.dat' file:

#our dat file looks like this:
##   date    hits   files   pages  visits   sites  kbytes
#20081126    1854    1080    1811     246     136   18060

# we want the x-axis to be a time line
set xdata time

# by default, the x-axis is defined by the first column in the 
# .dat file.  the format of our dates is like "20081230", we 
# need to tell gnuplot about that:
set timefmt "%Y%m%d"

# rotate the labels on the x-axis by 90°
set xtics rotate

# set the 
set term postscript eps color blacktext

# we want to output the plot to a file:
set output "hits.eps"

# we want to generate solid looking bars for our bar charts, so 
# we tell gnuplots to use a solid fill style
set style fill solid 0.5

# now plot the second (hits) and third (files) column from our 
# data file, using boxes instead of the default crosses
plot "numbers.dat" using 1:2 title 'hits' with boxes,\
     "numbers.dat" using 1:3 title 'files' with boxes

And the output is:

Log File Visualisation with Gnuplot

P.S.: Very handy for editing the plot file is the gnuplot bundle for TextMate.

Labels: , ,

4 Comments:

At 3:39 pm, Blogger Unknown said...

Hi Knud,

looks cool :)

I"ve seen that Gnuplot 4.2 has a way of enabling transparency, which might make the graph even look cooler, and also show the possibly hidden lines.

Maybe

set style fill transparent solid 0.5 noborder

does the trick?

Andreas.

 
At 3:51 pm, Blogger Knud Möller said...

Good idea, I'll try that! However, in this case there are no hidden lines, because there will never be more hits than files (except if we get a lot of 404s).

 
At 6:48 pm, Blogger tommycarstensen said...

The transparency doesn't work for me in postscript, and I need postscript, because the X11 terminal doesn't allow me to insert symbols. You have a solution to my problem? :-)

 
At 6:49 pm, Blogger tommycarstensen said...

The transparency doesn't work for me in postscript, and I need postscript, because the X11 terminal doesn't allow me to insert symbols. You have a solution to my problem? :-)

 

Post a Comment

<< Home