classification error ???
Mark M. Hoffman
mhoffman at lightlink.com
Fri Sep 13 22:30:04 CEST 2002
* David Relson <relson at osagesoftware.com> [2002-09-13 15:13:03 -0400]:
<snip>
>
> 1. The first word encountered has probability .8 and will go into the
> stats.extrema array.
> 2. The second word has a higher probability, e.g. .85. It will replace
> the first word.
> 3. All the other words have probability less than .8. 14 of them will be
> used to fill the array.
>
> The problem is that the first word should _remain_ in the array, but doesn't.
I stand corrected, as that is true. What a nasty bug. The list of extremes
should be sorted so that you always drop the lowest one when you insert a
new one. I'll code it up this weekend unless you beat me to it. :)
> I think I've spotted a second problem. If you remember, the subject of the
> spam message was "hello babe". "babe" has a spam indication probability of
> .888350. In the final 15, are several words with probability of .879424,
> but not "babe".
Could be explained by the first problem? "babe" was probably replaced by a
subsequent token w/ probability e.g. 0.99 or 0.01.
Regards,
--
Mark M. Hoffman
mhoffman at lightlink.com
For summay digest subscription: bogofilter-digest-subscribe at aotto.com
More information about the Bogofilter
mailing list