The 2 U's - Unsure and Update [was: wordlist.db problem]
WA Support
support at wildapache.net
Fri Jun 18 17:45:54 CEST 2004
Thank you,
I appreciate all the responses! Yours encapsulates what I need to know,
so double thanks.
David Relson wrote:
>
> Bogofilter's default configuration will classify a message as spam or
> non-spam. The SPAM_CUTOFF parameter is used for this. Messages with
> scores greater than or equal to SPAM_CUTOFF are classified as spam.
> Other messages are classified as ham.
>
> There is also a HAM_CUTOFF parameter. When used, messages must have
> scores less than or equal to HAM_CUTOFF to be classified as ham.
> Messages with scores between HAM_CUTOFF and SPAM_CUTOFF are classified
> as unsure. If you look in /etc/bogofilter.cf, you will see the
> following lines:
>
> #### CUTOFF Values
> #
> # both ham_cutoff and spam_cutoff are allowed.
> # setting ham_cutoff to a non-zero value will
> # enable tristate results (Yes/No/Unsure).
> #
> #ham_cutoff = 0.00
> #spam_cutoff = 0.99
> #
> ## with Yes/No/Unsure output:
> ## ham_cutoff = 0.45
> ## spam_cutoff= 0.99
>
> To turn on Yes/No/Unsure classification, remove the #'s from the last
> two lines.
>
> Once that's done, you may want to set the filtering rules for your mail
> program to include rules like:
>
> if header contains "X-Bogosity: Yes", put in Spam folder
> if header contains "X-Bogosity: Unsure", put in Unsure folder
Will do!
> The "-u" switch (autoupdate) is used to automatically expand the
> wordlist. When this switch is used and bogofilter classifies a message
> as Spam or Ham, the message's tokens are added to the wordlist with a
> ham/spam tag (as appropriate).
>
> As an example, suppose a new "Refinance now - best Mortgage rates"
> message comes in. It will have some words that bogofilter has seen and
> (probably) some new ones as well. Using '-u' the new words will be
> added to the wordlist so that bogofilter can better recognize the next,
> related message.
>
> If/when you use to use '-u', you need to be on the lookout for
> classification errors and retrain bogofilter with any messages that have
> been classified incorrectly. An incorrectly classified message that is
> auto-updated _may_ cause bogofilter to make additional classification
> errors in the future. This is the same problem as when you (the sys
> admin) incorrectly register a ham message as spam (or vice versa).
Thanks again, I will look at it with and without the -u switch on
independent systems for a few months. I am sure this has been done
before, but I have to 'see' it in action.
Murrah
More information about the Bogofilter
mailing list