Bogofilter and reclassifying

David Relson relson at osagesoftware.com
Fri Dec 5 13:32:42 CET 2003


On Fri, 5 Dec 2003 01:44:56 -0800
Nathaniel <nate at nate37.net> wrote:

> Hello,
> 
> I'm just wondering what is the "correct"/best way to reclassify a
> message.
> 
> Currently I have a script that will grep through spam/ham dirs for an 
> incorrect X-Bogosity header.  It then rescores it, checks to see if it
> is still incorrect and if so, it reclassifies it as spam/ham (if its
> the first pass it will -N or -S apporiatly) until it correctly scores.
>  It does this 
> all while overwriting the file with the new X-Bogosity header.
> 
> Is this correct?  Most howtos I've seen either recreate a wordlist or
> just mark as spam/ham the entire corpus, but I didn't want to maintain
> large corpuses and wanted something fairly efficient...
> 
> Thanks.

Greetings Nathaniel,

When I receive a message that's classified incorrectly, I simply feed it
to bogofilter using -Ns or -Sn.  For a message that's classified as
unsure, I use -n or -s.  I only train bogofilter _once_ on any message. 
Part of the bayesian ideal is that messages are independent of one
another and each message is represented exactly once in the wordlist.

Some people (and some spam filters) will keep training on a message
until it scores the way they want.  That works for them.

AFAICT there's no single, definitive, "right" way to train a spam
filter.  My advice is to pick a method that works for you, that isn't
labor intensive, and that you're happy with and will keep using.  I do
believe you'll be fine :-)

Hope this helps.

David




More information about the Bogofilter mailing list