Anybody seen this?
Eric Seppanen
eds at reric.net
Tue Sep 17 23:02:43 CEST 2002
On Tue, Sep 17, 2002 at 04:50:36PM -0400, Paul Tomblin wrote:
> It's a explanation of what the original Paul Graham paper got wrong:
> http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html
I think it'd be just wonderful if we started collecting potential
algorithms and tested them against each other.
Note that there are two algorithms in bogofilter, and this paper deals
with only the second.
The first algorithm outputs a "spamicity" value for a given word, using
information gleaned from two or more wordists.
The second algorithm combines some of those numbers to output a
"spamicity" value for the whole message.
I suspect there's potential for improvement in the first as well if anyone
with a statistical brain wants to think about it.
For summay digest subscription: bogofilter-digest-subscribe at aotto.com
More information about the Bogofilter
mailing list