New script to train bogofilter

Boris 'pi' Piwinger 3.14 at
Thu Jul 3 15:04:06 CEST 2003

David Relson wrote:

> I've used it and think I understand it.  First, it creates an index of all 
> the messages.  Then it shuffles them.  Using the shuffled index, it scores 
> each message and trains on errors.

In that order? Mine always scores with the database after
training with previous messages. I think this is what
randomtrain must also do.

I don't really see and advantage of shuffling.

> I'm sure it _does_ look at all messages.  Like yours, the resulting 
> wordlists are much smaller.  Seems like a small percentage of ham messages 
> trigger training while a fairly large percentage of spam train.  I don't 
> remember exact percentages, but I'd guess they were approx 10% and 40%.

That would be way more than mine.


More information about the Bogofilter mailing list