New script to train bogofilter
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Thu Jul 3 15:04:06 CEST 2003
David Relson wrote:
[radomtrain]
> I've used it and think I understand it. First, it creates an index of all
> the messages. Then it shuffles them. Using the shuffled index, it scores
> each message and trains on errors.
In that order? Mine always scores with the database after
training with previous messages. I think this is what
randomtrain must also do.
I don't really see and advantage of shuffling.
> I'm sure it _does_ look at all messages. Like yours, the resulting
> wordlists are much smaller. Seems like a small percentage of ham messages
> trigger training while a fairly large percentage of spam train. I don't
> remember exact percentages, but I'd guess they were approx 10% and 40%.
That would be way more than mine.
pi
More information about the Bogofilter
mailing list