Finding problem messages

Jonathan Kamens jik at kamens.brookline.ma.us
Thu Apr 22 18:08:58 CEST 2010


Quoted from David Relson:
>>   bogofilter -v -d . -n -B -M nonspam.mbx
>>   bogofilter -v -d . -s -B -M spam.mbx
>>   bogofilter -v -d . -M -I spam.mbx
>>   bogofilter -v -d . -M -I nonspam.mbx
The problem with this approach is that it will not build the same word 
list that bogotune builds when it builds an internal word list, at least 
not if I understand bogotune correctly.

When bogotune builds an internal word list, it uses half of the messages 
fed to it for building the word list, and then it uses the other half of 
the messages fed to it for scoring and tuning.

I suppose if I knew exactly how bogotune chooses which messages to use 
for the word list and which ones to use for tuning, I could reproduce 
its behavior by hand.  But since I do not know (I do not believe it is 
documented), I can't do that.

I could read the source code, sure, but it's easier just to wait until I 
have enough ham and spam messages in my real word list that bogotune 
doesn't have to build an internal one.

   jik




More information about the Bogofilter mailing list