bogotune suggests spam_cutoff of 0?
relson at osagesoftware.com
Thu Apr 22 01:19:13 CEST 2010
On Wed, 21 Apr 2010 09:28:28 -0400
Jonathan Kamens wrote:
> I'm pretty sure that there are no misclassified messages in my ham
> and spam files.
> I can't find out what happens if I remove the high-scoring ham or
> low-scoring spam messages from my files before training, because I
> don't know which messages those are. This is because bogotrain is
> building its own word list rather than using the one I've already
> got, and the scores it's reporting are based on that internal word
> list rather than my actgual word list. As a result, there's no way
> for me to figure out which messages it's claiming have those scores.
> I think the short answer is that I need to wait until I've got a large
> enough body of ham and spam messages to be able to train using my
> real word list instead of the one built by bogotune.
> Thanks for your help.
Whether there are misclassified messages or not is certainly for you to
judge. Bogofilter is merely pointing out that that there are very high
scoring non-spam and very low scoring spam -- an indication of possible
It shouldn't be too hard to find the problem messages. The following 4
lines will build a wordlist from a pair of mbox files and then score
each message and print its score:
bogofilter -v -d . -n -B -M nonspam.mbx
bogofilter -v -d . -s -B -M spam.mbx
bogofilter -v -d . -M -I spam.mbx
bogofilter -v -d . -M -I nonspam.mbx
More information about the Bogofilter