Finding own misclassifications

David Relson relson at osagesoftware.com
Mon Jul 21 21:19:24 CEST 2003


At 03:11 PM 7/21/03, Boris 'pi' Piwinger wrote:
>Hi!
>
>I modified bogominitrain.pl so that it can save the messages
>used for training. The idea was that mails I had classified
>as ham or spam in error will likely be used for training.
>And actually from about 200 messages each used in one run I
>found about four errors. From those I found other messages.
>For example several mails from Network Solutions were
>classified as spam (not only from their promotional mailing
>list). Overlooked false positives. Also there were errors
>from my first collection including spam to (whitelisted)
>mailing lists I missed to delete.
>
>So bogofilter can with some trick be used to find those
>errors. I believe those errors have some high price.

pi,

You are correct, misclassified messages are likely to cause bogofilter to 
perform poorly.

Most of the messages that are in the wrong mbx file will trigger the 
train-on-error logic.  Being able to set them aside for examination is 
definitely useful.

By the way, I've noticed the same problem with messages from Network 
Solutions.  My favorite is the one with "Subject:  earn 1,000 free miles".

David





More information about the Bogofilter mailing list