Finding problem messages
Jonathan Kamens
jik at kamens.brookline.ma.us
Thu Apr 22 18:08:58 CEST 2010
Quoted from David Relson:
>> bogofilter -v -d . -n -B -M nonspam.mbx
>> bogofilter -v -d . -s -B -M spam.mbx
>> bogofilter -v -d . -M -I spam.mbx
>> bogofilter -v -d . -M -I nonspam.mbx
The problem with this approach is that it will not build the same word
list that bogotune builds when it builds an internal word list, at least
not if I understand bogotune correctly.
When bogotune builds an internal word list, it uses half of the messages
fed to it for building the word list, and then it uses the other half of
the messages fed to it for scoring and tuning.
I suppose if I knew exactly how bogotune chooses which messages to use
for the word list and which ones to use for tuning, I could reproduce
its behavior by hand. But since I do not know (I do not believe it is
documented), I can't do that.
I could read the source code, sure, but it's easier just to wait until I
have enough ham and spam messages in my real word list that bogotune
doesn't have to build an internal one.
jik
More information about the Bogofilter
mailing list