Anybody seen this?

Eric Seppanen eds at reric.net
Tue Sep 17 23:02:43 CEST 2002


On Tue, Sep 17, 2002 at 04:50:36PM -0400, Paul Tomblin wrote:
> It's a explanation of what the original Paul Graham paper got wrong:
> http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html

I think it'd be just wonderful if we started collecting potential 
algorithms and tested them against each other.

Note that there are two algorithms in bogofilter, and this paper deals 
with only the second.

The first algorithm outputs a "spamicity" value for a given word, using 
information gleaned from two or more wordists.

The second algorithm combines some of those numbers to output a 
"spamicity" value for the whole message.

I suspect there's potential for improvement in the first as well if anyone 
with a statistical brain wants to think about it.

For summay digest subscription: bogofilter-digest-subscribe at aotto.com



More information about the Bogofilter mailing list