GETTING.STARTED (rev 2)

David Relson relson at osagesoftware.com
Thu Oct 28 00:12:25 CEST 2004


On Wed, 27 Oct 2004 17:21:05 +0200 (CEST)
Boris 'pi' Piwinger wrote:

> David Relson said:
> 
> >> > Eh??  Bogotune uses the wordist, and the ham and spam corpora you
> >> > specify, and then does a rather exhaustive scan of possible
> >scoring> > parameters to find what gives the best results.  As you
> >know,> > bogotune has minimum requirements for number of messages
> >registered> > in wordlist.db and minimum numbers of messages for the
> >ham and spam> > corpora used in the tuning process.
> >>
> >> Right, so it is not usable for pure train-on-error approaches.
> >
> > Usually when I run bogotune, I start with an empty wordlist and
> > 10K-15K ham and 10K-15K spam.
> 
> This is one approach. Others are listed in the FAQ. Also some people
> don't have huge mail archives.

True.  My archive is larger than most.  In any case, bogotune needs to
have a large set of messages for what it does and expects a large number
of messages to be represented in the word list.  

I'm thinking of a toe-tune (train-on-error + bogotune) experiment which
will probably use a modified bogotune that doesn't care about the
message count in the wordlist.  It'll be interesting to see if bogotune
runs happily or something goes wrong.



More information about the Bogofilter mailing list