training to exhaustion?

Greg Louis glouis at dynamicro.on.ca
Tue Mar 9 13:20:43 CET 2004


On 20040309 (Tue) at 1213:34 +0100, Boris 'pi' Piwinger wrote:

> Also you will
> need to train with ham. Best is to swith between them.
> 
Practical question: If I set up a Tom-Anderson-like experiment (see his
recent posting in praise of repetitive training), and it works, then
after a relatively short time there will be precious few wrongly
classified nonspam among the new test messages.  Almost all the errors
will be unsure spam.  pi, do you occasionally pad your training db with
correctly classified nonspam to rebalance the message counts, or do you
let it get lopsided?

(Anyone who trains on error, repetitively or not, is going to have this
problem; I usually pad before tuning, or whenever my training db gets
to be 10-15% out of balance.)

-- 
| G r e g  L o u i s         | gpg public key: 0x400B1AA86D9E3E64 |
|  http://www.bgl.nu/~glouis |   (on my website or any keyserver) |
|  http://wecanstopspam.org in signatures helps fight junk email. |

Header information for this message:
Subject: Re: training to exhaustion?
     To: bogofilter at aotto.com
   From: Greg Louis <glouis at dynamicro.on.ca>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 213 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040309/d07938f5/attachment.sig>


More information about the Bogofilter mailing list