better Bayesian bogofilter
Greg Louis
glouis at dynamicro.on.ca
Wed Aug 13 14:32:39 CEST 2003
On 20030813 (Wed) at 1400:45 +0200, Boris 'pi' Piwinger wrote:
> Greg Louis wrote:
>
> >> What I'm more interested in knowing is exactly _how_ you plan to keep track
> >> of the ham/spam ratio. One thought that crosses my mind is having a
> >> ".SCORE" token rather like .MSG_COUNT. If I understand your article,
> >> .SCORE needs to be updated for each ham and each spam scored.
> >
> > That was my first idea, yes.
>
> What is wrong with .MSG_COUNT, as long as you make sure you
> don't diviede by zero?
> > What I intend to do first is implement
> > Eq. #5 with a single parameter that can be set manually
>
> Which would almost impossible to use for train on error. So
> this part of the test would not work.
No, I train on error, and that's exactly why my training db's
.MSG_COUNTs aren't accurately characteristic of the population's. What
I'll do is measure the proportion of spam in a training batch and use
that until next time. Since I train every couple of weeks, that should
be close enough, given that the accuracy doesn't change drastically
with minor changes in the proportion of spam.
--
| G r e g L o u i s | gpg public key: 0x400B1AA86D9E3E64 |
| http://www.bgl.nu/~glouis | (on my website or any keyserver) |
| http://wecanstopspam.org in signatures helps fight junk email. |
More information about the Bogofilter
mailing list