better Bayesian bogofilter

Matthias Andree matthias.andree at gmx.de
Tue Aug 12 16:58:17 CEST 2003


Greg Louis <glouis at dynamicro.on.ca> writes:

> By "more complicated calculation" is meant "equation #5", right?  Yeah,
> just once per token ;)  On the other hand, if you train with every
> message or randomly select errors-and-unsures to keep the ratio right,
> you get to use equation #4, which saves 3 divisions per token over what
> we do now.

I'd think we should go for the code that doesn't care about the ratio of
spam to ham used in training. We'd better avoid optimizations that
depend on the environment or makes assumptions about the user.

Does your code affect "make check" results?

-- 
Matthias Andree




More information about the Bogofilter mailing list