classification error ???
Eric Seppanen
eds at reric.net
Fri Sep 13 19:41:15 CEST 2002
On Fri, Sep 13, 2002 at 01:31:56PM -0400, David Relson wrote:
>
> I've been reading the code, and I think the problem is much worse than
> that. Here's what I think function bogofilter() does:
<snip>
> The problem is that NO determination is made about whether the word is
> interesting, i.e. a strong indicator of spamness or goodness. The loop
> simply puts the FIRST 15 words into the stats.extrema array.
Not exactly true. the first 15 words go in automatically, but the 16th,
17th.. Nth words each get checked to see if they're more "interesting"
(interesting => farther from 0.5) and only the most interesting words are
kept.
At least, that's how it's supposed to work. I hope somebody would've
noticed by now if it wasn't.
Try adding enough -v options that you can see how each word is evaluated
and it should be more clear how it's picking the top 15.
For summay digest subscription: bogofilter-digest-subscribe at aotto.com
More information about the Bogofilter
mailing list