repetitive training
Greg Louis
glouis at dynamicro.on.ca
Tue Mar 9 12:59:43 CET 2004
On 20040308 (Mon) at 2008:55 -0500, Tom Anderson wrote:
> > discussed methodology and theory, but IMHO what's really needed is more
> > experimentation.
>
> FWIW, my experience with repetitive training has been excellent.
...
> (reward/punishment). To facilitate this, they forward incorrectly
> classified emails as attachments to bfproxy
...
> ("sit"->cookie, "sit"->cookie, etc ;). If the first registration does
> not move the classification into the cutoff zone, then it registers
> again and again either until it does classify correctly, or until an
> arbitrary maximum is reached (default 10 recursions) in case it never
> converges.
Great! A procedure that is easy to reproduce "in the lab" for testing.
I may very likely run such an experiment.
(BTW I think I should point out that up to now, pi and I have _not_
been testing the same procedure and nothing I've written was intended
to claim otherwise. The closest I came to doing things his way was in
the small experiment reported at http://www.bgl.nu/training.html; after
that I was, and still am, more interested in seeing whether repetitive
training has any value _in_my_working_setup_. So when pi says my
experiments up to now don't apply to his technique I don't disagree at
all. The most important disagreement we have is over the value of
reoptimizing parameters after training; I claim it's moderately useful
to do so, pi seems to feel it's too far from what happens in
production -- in production, I tend to tune about once in 8 weeks or so
unless something happens to change the spam population significantly;
then, of course, I'd tune sooner.)
> So far, results have been great. Unsures have been reduced
> substantially. It does not seem to have contributed to any false
> positives, as I haven't received any myself, nor have had any reported.
> False negatives hover around 1-2 per day, unsures 8-10 (down from 30-40
> a few weeks ago).
Is this per user? If so, the results aren't as good as I get ("no" fn
or fp and an average of about 5 unsures/day), but pretty close, and
with what seems to be significantly less effort. (By "no" I mean
unmeasurably few, like one in 120,000 or so; for me that translates to
one in about six weeks.)
--
| G r e g L o u i s | gpg public key: 0x400B1AA86D9E3E64 |
| http://www.bgl.nu/~glouis | (on my website or any keyserver) |
| http://wecanstopspam.org in signatures helps fight junk email. |
Header information for this message:
Subject: Re: repetitive training
To: bogofilter <bogofilter at aotto.com>
From: Greg Louis <glouis at dynamicro.on.ca>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 213 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040309/515d70e4/attachment.sig>
More information about the Bogofilter
mailing list