Statistics? Graphics?

David Carmean dlc at halibut.com
Sat May 21 20:46:40 CEST 2005


Here's where I am after a few hours of learning to use Ploticus 
and writing a couple of perl scripts to parse the log I've been 
keeping of message spamicity scores:

    http://www.halibut.com/~dlc/tmp/bogodata.png

You can see where I upgraded from 0.17.2 to 0.94.11 on May 14.  
You can also see by the large number of green points the few weeks 
before that how I was having more and more trouble with unflagged 
spam. (I had been kind of lazy with keeping up on the training).

On that day I also purged my wordlist of all tokens more than 
a year old.

The points above 1.000 and below 0.000 are Ploticus's point "clustering" 
feature for scatterplots which makes multiple coincident points more visible 
by ofsetting them a little.

Once I get this all dialed in I'll share my perl and Ploticus scripts 
with all who wish.




On Fri, May 20, 2005 at 05:41:18PM -0700, Kevin Williams wrote:
> David, actually I "did" have this working on my previous server.  I am
> currently working on re-writing for my new server.  Basically, I wrote
> a script to parse the spam and ham folders for size, and the
> wordlist.db from bogo.  I don't remember the mbox parser but I think I
> used bogoutil or bogotune to get  stats from my wordlist.  The script
> would run nightly and add a row to a sql DB.  Then I used a php
> graphing library to graph my filtered spam, missed spam, ham, and
> total e-mail.  I beleive I had the graph granularity down to months. 
> it was particularily interesting during the initial training process
> while my spam and ham caches were growing from 100s upto about 2000. 
> Once bogo was trained at about 2000 for each, then the graph was
> predictable.  Ofcourse, there are so many techniques for implementing
> bogo that YMMV.
> -Kevin
> 
> On 5/20/05, David Carmean <dlc at halibut.com> wrote:
> > 
> > Has anyone compiled a long-term log of the performance of their bogofilter
> > installation, e.g. timestamped log of spamicity, ham/spam cutoffs, db size,
> > periodic (hourly/daily) ham/spam/unsure/total volumes, etc?
> > 
> > And then plotted it to look for interesting patterns?
> > 
> > I'm playing with Ploticus for the first time today.
> > 
> > _______________________________________________
> > Bogofilter mailing list
> > Bogofilter at bogofilter.org
> > http://www.bogofilter.org/mailman/listinfo/bogofilter
> >
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter



More information about the Bogofilter mailing list