classification error ???

Eric Seppanen eds at reric.net
Fri Sep 13 19:41:15 CEST 2002


On Fri, Sep 13, 2002 at 01:31:56PM -0400, David Relson wrote:
> 
> I've been reading the code, and I think the problem is much worse than 
> that.  Here's what I think function bogofilter() does:
<snip>
> The problem is that NO determination is made about whether the word is 
> interesting, i.e. a strong indicator of spamness or goodness.  The loop 
> simply puts the FIRST 15 words into the stats.extrema array.

Not exactly true.  the first 15 words go in automatically, but the 16th, 
17th.. Nth words each get checked to see if they're more "interesting" 
(interesting => farther from 0.5) and only the most interesting words are 
kept.

At least, that's how it's supposed to work.  I hope somebody would've 
noticed by now if it wasn't.

Try adding enough -v options that you can see how each word is evaluated 
and it should be more clear how it's picking the top 15.

For summay digest subscription: bogofilter-digest-subscribe at aotto.com



More information about the Bogofilter mailing list