better Bayesian bogofilter

Greg Louis glouis at dynamicro.on.ca
Wed Aug 13 15:18:54 CEST 2003


On 20030813 (Wed) at 0838:11 -0400, David Relson wrote:

> If we're going to keep accurate info in the wordlist, it needs to be spam 
> and ham counts.  A ratio is not maintainable.  The cost of accuracy is, 
> roughly, lock database, write new counts for .SCORE, unlock the 
> database.

Agreed.  The B' / (B' + G') calculation is easier; if you store the
proportion of spam, then you have to store the value of (B' + G') too
so bogofilter can recalculate, and the calculation is appreciably more
complex.  What I worry is that we may need to limit the counting to
some rolling interval like one or three months, which will complicate
matters not a little.

-- 
| G r e g  L o u i s         | gpg public key: 0x400B1AA86D9E3E64 |
|  http://www.bgl.nu/~glouis |   (on my website or any keyserver) |
|  http://wecanstopspam.org in signatures helps fight junk email. |




More information about the Bogofilter mailing list