bogominitrain

Boris 'pi' Piwinger 3.14 at piology.org
Sat Oct 15 12:49:43 CEST 2005


David Relson <relson at osagesoftware.com> wrote:

>> DOes it make sense to train bogof. with bogominitrain.pl on same wordlist.db and the same spam and ham that you used first time to create that wordlist.db with -s and -n switches ?
>
>No, I don't think it makes sense.  When you initially create a wordlist
>with a bunch of ham and spam, you're (probably) creating a wordlist
>that's larger than it _must_ be.  

That's right. Also there is the side effect, that probably
if you just use bogomitrain for retraining (which will work)
you need again many more messages to balance the many
messages used in the first place.

>Stated differently, if you create the wordlist using an optimal set of
>spam and ham, you'll have all the words needed to do a very good job of
>classification and you'll also have a small wordlist.  You can think of
>this as using an ideal set of messages to create an ideal wordlist.

Ideal would be great;-) But still, it is a good proxy.

>I suspect that bogominitrain's author, Boris "pi" Piwinger, will chime
>in with more info about this tool's merits.

You made such a good case of it, there is hardly anything to
add:-))

pi



More information about the Bogofilter mailing list