Statistics? Graphics?
David Carmean
dlc at halibut.com
Sat May 21 20:46:40 CEST 2005
Here's where I am after a few hours of learning to use Ploticus
and writing a couple of perl scripts to parse the log I've been
keeping of message spamicity scores:
http://www.halibut.com/~dlc/tmp/bogodata.png
You can see where I upgraded from 0.17.2 to 0.94.11 on May 14.
You can also see by the large number of green points the few weeks
before that how I was having more and more trouble with unflagged
spam. (I had been kind of lazy with keeping up on the training).
On that day I also purged my wordlist of all tokens more than
a year old.
The points above 1.000 and below 0.000 are Ploticus's point "clustering"
feature for scatterplots which makes multiple coincident points more visible
by ofsetting them a little.
Once I get this all dialed in I'll share my perl and Ploticus scripts
with all who wish.
On Fri, May 20, 2005 at 05:41:18PM -0700, Kevin Williams wrote:
> David, actually I "did" have this working on my previous server. I am
> currently working on re-writing for my new server. Basically, I wrote
> a script to parse the spam and ham folders for size, and the
> wordlist.db from bogo. I don't remember the mbox parser but I think I
> used bogoutil or bogotune to get stats from my wordlist. The script
> would run nightly and add a row to a sql DB. Then I used a php
> graphing library to graph my filtered spam, missed spam, ham, and
> total e-mail. I beleive I had the graph granularity down to months.
> it was particularily interesting during the initial training process
> while my spam and ham caches were growing from 100s upto about 2000.
> Once bogo was trained at about 2000 for each, then the graph was
> predictable. Ofcourse, there are so many techniques for implementing
> bogo that YMMV.
> -Kevin
>
> On 5/20/05, David Carmean <dlc at halibut.com> wrote:
> >
> > Has anyone compiled a long-term log of the performance of their bogofilter
> > installation, e.g. timestamped log of spamicity, ham/spam cutoffs, db size,
> > periodic (hourly/daily) ham/spam/unsure/total volumes, etc?
> >
> > And then plotted it to look for interesting patterns?
> >
> > I'm playing with Ploticus for the first time today.
> >
> > _______________________________________________
> > Bogofilter mailing list
> > Bogofilter at bogofilter.org
> > http://www.bogofilter.org/mailman/listinfo/bogofilter
> >
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter
More information about the Bogofilter
mailing list