Finding own misclassifications
David Relson
relson at osagesoftware.com
Mon Jul 21 21:19:24 CEST 2003
At 03:11 PM 7/21/03, Boris 'pi' Piwinger wrote:
>Hi!
>
>I modified bogominitrain.pl so that it can save the messages
>used for training. The idea was that mails I had classified
>as ham or spam in error will likely be used for training.
>And actually from about 200 messages each used in one run I
>found about four errors. From those I found other messages.
>For example several mails from Network Solutions were
>classified as spam (not only from their promotional mailing
>list). Overlooked false positives. Also there were errors
>from my first collection including spam to (whitelisted)
>mailing lists I missed to delete.
>
>So bogofilter can with some trick be used to find those
>errors. I believe those errors have some high price.
pi,
You are correct, misclassified messages are likely to cause bogofilter to
perform poorly.
Most of the messages that are in the wrong mbx file will trigger the
train-on-error logic. Being able to set them aside for examination is
definitely useful.
By the way, I've noticed the same problem with messages from Network
Solutions. My favorite is the one with "Subject: earn 1,000 free miles".
David
More information about the Bogofilter
mailing list