repetitive training

Greg Louis glouis at dynamicro.on.ca
Tue Mar 9 12:59:43 CET 2004


On 20040308 (Mon) at 2008:55 -0500, Tom Anderson wrote:

> > discussed methodology and theory, but IMHO what's really needed is more
> > experimentation.
> 
> FWIW, my experience with repetitive training has been excellent.
...
> (reward/punishment).  To facilitate this, they forward incorrectly
> classified emails as attachments to bfproxy
...
> ("sit"->cookie, "sit"->cookie, etc ;).  If the first registration does
> not move the classification into the cutoff zone, then it registers
> again and again either until it does classify correctly, or until an
> arbitrary maximum is reached (default 10 recursions) in case it never
> converges.

Great!  A procedure that is easy to reproduce "in the lab" for testing.
I may very likely run such an experiment.

(BTW I think I should point out that up to now, pi and I have _not_
been testing the same procedure and nothing I've written was intended
to claim otherwise.  The closest I came to doing things his way was in
the small experiment reported at http://www.bgl.nu/training.html; after
that I was, and still am, more interested in seeing whether repetitive
training has any value _in_my_working_setup_.  So when pi says my
experiments up to now don't apply to his technique I don't disagree at
all.  The most important disagreement we have is over the value of
reoptimizing parameters after training; I claim it's moderately useful
to do so, pi seems to feel it's too far from what happens in
production -- in production, I tend to tune about once in 8 weeks or so
unless something happens to change the spam population significantly;
then, of course, I'd tune sooner.)

> So far, results have been great.  Unsures have been reduced
> substantially.  It does not seem to have contributed to any false
> positives, as I haven't received any myself, nor have had any reported. 
> False negatives hover around 1-2 per day, unsures 8-10 (down from 30-40
> a few weeks ago).

Is this per user?  If so, the results aren't as good as I get ("no" fn
or fp and an average of about 5 unsures/day), but pretty close, and
with what seems to be significantly less effort.  (By "no" I mean
unmeasurably few, like one in 120,000 or so; for me that translates to
one in about six weeks.)

-- 
| G r e g  L o u i s         | gpg public key: 0x400B1AA86D9E3E64 |
|  http://www.bgl.nu/~glouis |   (on my website or any keyserver) |
|  http://wecanstopspam.org in signatures helps fight junk email. |

Header information for this message:
Subject: Re: repetitive training
     To: bogofilter <bogofilter at aotto.com>
   From: Greg Louis <glouis at dynamicro.on.ca>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 213 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040309/515d70e4/attachment.sig>


More information about the Bogofilter mailing list