Training frustration

Anne Wilson cannewilson at googlemail.com
Sun Feb 17 20:26:39 CET 2008


On Sunday 17 February 2008 18:22:07 Pavel Kankovsky wrote:
> On Mon, 11 Feb 2008, Anne Wilson wrote:
> > You seem surprised that I was cleaning out the 'trained' messages.  I
> > thought it was a bad idea to keep running the same messages through the
> > training.  Am I wrong?
>
> It depends. Look for "training to exhaustion". I myself don't do it. My
> spam corpus is so huge that I can always find multiple independent copies
> of even the most difficult and exotic spam. :)
>
> > I'll keep the -c parameter in the command in future.  What is the reason
> > for it not being the default?
>
> I am not the author of trainbogo.sh. I guess the idea was to make it
> possible to do several passes through the corpus in order to catch cases
> when the classification of a message changes (from correct to incorrect)
> after some other messages has been trained.
>
> --Pavel Kankovsky aka Peak  [ Boycott Microsoft--http://www.vcnet.com/bms ]
> "Resistance is futile. Open your source code and prepare for assimilation."
>
Fair enough.  My wordlist has been built over a considerable time, so I can't 
remember when I last saw a wrongly classified message.  The only problem now 
is them Unsure ones, as the pattern of spam messages changes.  Inevitable, of 
course, but that's why I wanted to understand the 'further training' 
technique.  It's clearly working, as the numbers decrease each time I need to 
do this.  I save Unsures until I have around 20 in the spam folder, then run 
trainbogo again.

Thanks for the help

Anne
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20080217/ed26a146/attachment.sig>


More information about the Bogofilter mailing list