The 2 U's - Unsure and Update [was: wordlist.db problem]

WA Support support at wildapache.net
Fri Jun 18 17:45:54 CEST 2004


Thank you,

I appreciate all the responses!  Yours encapsulates what I need to know,
so double thanks.

David Relson wrote:

> 
> Bogofilter's default configuration will classify a message as spam or
> non-spam.  The SPAM_CUTOFF parameter is used for this.  Messages with
> scores greater than or equal to SPAM_CUTOFF are classified as spam.
> Other messages are classified as ham.
> 
> There is also a HAM_CUTOFF parameter.  When used, messages must have
> scores less than or equal to HAM_CUTOFF to be classified as ham.
> Messages with scores between HAM_CUTOFF and SPAM_CUTOFF are classified
> as unsure.  If you look in /etc/bogofilter.cf, you will see the
> following lines:
> 
>   #### CUTOFF Values
>   #
>   #     both ham_cutoff and spam_cutoff are allowed.
>   #     setting ham_cutoff to a non-zero value will
>   #     enable tristate results (Yes/No/Unsure).
>   #
>   #ham_cutoff  = 0.00
>   #spam_cutoff = 0.99
>   #
>   ## with Yes/No/Unsure output:
>   ## ham_cutoff = 0.45
>   ## spam_cutoff= 0.99
> 
> To turn on Yes/No/Unsure classification, remove the #'s from the last
> two lines.
> 
> Once that's done, you may want to set the filtering rules for your mail
> program to include rules like:
> 
>   if header contains "X-Bogosity: Yes", put in Spam folder
>   if header contains "X-Bogosity: Unsure", put in Unsure folder

Will do!

> The "-u" switch (autoupdate) is used to automatically expand the
> wordlist.  When this switch is used and bogofilter classifies a message
> as Spam or Ham, the message's tokens are added to the wordlist with a
> ham/spam tag (as appropriate).
> 
> As an example, suppose a new "Refinance now - best Mortgage rates"
> message comes in.  It will have some words that bogofilter has seen and
> (probably) some new ones as well.  Using '-u' the new words will be
> added to the wordlist so that bogofilter can better recognize the next,
> related message.
> 
> If/when you use to use '-u', you need to be on the lookout for
> classification errors and retrain bogofilter with any messages that have
> been classified incorrectly.  An incorrectly classified message that is
> auto-updated _may_ cause bogofilter to make additional classification
> errors in the future.   This is the same problem as when you (the sys
> admin) incorrectly register a ham message as spam (or vice versa).

Thanks again, I will look at it with and without the -u switch on
independent systems for a few months.  I am sure this has been done
before, but I have to 'see' it in action.

Murrah



More information about the Bogofilter mailing list