[PATCH] Better tagging.

David Relson relson at osagesoftware.com
Sun Sep 14 00:14:29 CEST 2003


On 14 Sep 2003 07:49:09 +1000
michael at optusnet.com.au wrote:

> David Relson <relson at osagesoftware.com> writes:
> > Michael,
> > 
> > And an interesting follow up, it is!  The length shows that there's
> > a relatively small number of tokens involved.  The counts show that
> > their usefulness is dependent on min_dev.  Having run bogotune, my
> > min_dev is set at 0.435, it appears that none of the tokens will
> > contribute to spam scores at my site.  However, bogoutil's "scores"
> > aren't the best, as they don't know about the tuning parameters
> > (specifically ROBS and ROBX).  I'll have to test further (likely add
> > your numbers to a wordlist and score a dummy message with "-vvv"
> > output to see if the tokens matter)
> > 
> > By the way, have you used bogotune?  What parameter set are you
> > using?
> 
> No, haven't used bogotune. My working set is about 200k messages,
> and bogotune looked like it was going to take weeks to finish. :)
> 
> I'm using pure defaults (mindev 0.1).
> 
> Michael.

Michael,

Ah you believe in art, not science!  If I remember the history,
bogofilter's default parameters are based on some tests run by Greg and
me on our respective mail corpora, with a pinch of "this looks
reasonable" thrown into the recipe.  Since bogofilter is doing so well
for so many either we're very smart or we got very lucky.  Bogotune is
the tool for applying science to setting the parameters, i.e. test a
broad range of values and then narrow in on what works best with the
data being used.

With your 200k messages, bogotune could conceivable take weeks.  It
might be useful to break the 200k into 10 or 20 equal groups and then
pick several groups and run bogotune.  That would reduce the compute
time a whole lot and it would be interesting to see if bogotune found
the tested to be comparable, i.e. produced similar results.

David

-- 
David Relson                   Osage Software Systems, Inc.
relson at osagesoftware.com       Ann Arbor, MI 48103
www.osagesoftware.com          tel:  734.821.8800




More information about the bogofilter-dev mailing list