Finding own misclassifications
    David Relson 
    relson at osagesoftware.com
       
    Mon Jul 21 21:19:24 CEST 2003
    
    
  
At 03:11 PM 7/21/03, Boris 'pi' Piwinger wrote:
>Hi!
>
>I modified bogominitrain.pl so that it can save the messages
>used for training. The idea was that mails I had classified
>as ham or spam in error will likely be used for training.
>And actually from about 200 messages each used in one run I
>found about four errors. From those I found other messages.
>For example several mails from Network Solutions were
>classified as spam (not only from their promotional mailing
>list). Overlooked false positives. Also there were errors
>from my first collection including spam to (whitelisted)
>mailing lists I missed to delete.
>
>So bogofilter can with some trick be used to find those
>errors. I believe those errors have some high price.
pi,
You are correct, misclassified messages are likely to cause bogofilter to 
perform poorly.
Most of the messages that are in the wrong mbx file will trigger the 
train-on-error logic.  Being able to set them aside for examination is 
definitely useful.
By the way, I've noticed the same problem with messages from Network 
Solutions.  My favorite is the one with "Subject:  earn 1,000 free miles".
David
    
    
More information about the bogofilter
mailing list