better Bayesian bogofilter

Tue Aug 12 16:58:17 CEST 2003

Greg Louis <glouis at dynamicro.on.ca> writes:

> By "more complicated calculation" is meant "equation #5", right?  Yeah,
> just once per token ;)  On the other hand, if you train with every
> message or randomly select errors-and-unsures to keep the ratio right,
> you get to use equation #4, which saves 3 divisions per token over what
> we do now.

I'd think we should go for the code that doesn't care about the ratio of
spam to ham used in training. We'd better avoid optimizations that
depend on the environment or makes assumptions about the user.

Does your code affect "make check" results?

-- 
Matthias Andree