better Bayesian bogofilter
Greg Louis
glouis at dynamicro.on.ca
Wed Aug 13 15:18:54 CEST 2003
On 20030813 (Wed) at 0838:11 -0400, David Relson wrote:
> If we're going to keep accurate info in the wordlist, it needs to be spam
> and ham counts. A ratio is not maintainable. The cost of accuracy is,
> roughly, lock database, write new counts for .SCORE, unlock the
> database.
Agreed. The B' / (B' + G') calculation is easier; if you store the
proportion of spam, then you have to store the value of (B' + G') too
so bogofilter can recalculate, and the calculation is appreciably more
complex. What I worry is that we may need to limit the counting to
some rolling interval like one or three months, which will complicate
matters not a little.
--
| G r e g L o u i s | gpg public key: 0x400B1AA86D9E3E64 |
| http://www.bgl.nu/~glouis | (on my website or any keyserver) |
| http://wecanstopspam.org in signatures helps fight junk email. |
More information about the Bogofilter
mailing list