[relson at osagesoftware.com: Re: Bogofilter 0.11.2 config questions]
David Relson
relson at osagesoftware.com
Wed Jun 4 22:29:18 CEST 2003
At 04:09 PM 6/4/03, jerry wrote:
>----- Forwarded message from David Relson <relson at osagesoftware.com> -----
>
>From: David Relson <relson at osagesoftware.com>
>X-Bogosity: Ham, tests=bogofilter, spamicity=3.72e-04, version=0.13.3
> int cnt prob spamicity histogram
> 0.00 35 0.009490 0.007906 ###################################
> 0.10 5 0.126293 0.022912 #####
> 0.20 0 0.000000 0.022912
> 0.30 0 0.000000 0.022912
> 0.40 0 0.000000 0.022912
> 0.50 0 0.000000 0.022912
> 0.60 0 0.000000 0.022912
> 0.70 0 0.000000 0.022912
> 0.80 0 0.000000 0.022912
> 0.90 3 0.999643 0.307055 ###
>Status: RO
>Content-Length: 938
>Lines: 29
>
>
> >Jerry,
>
> >0.13.3 is a big step forwards from 0.11.2. Be sure to read both
> >RELEASE.NOTES-0.12 and RELEASE.NOTES-0.13. They cover the biggest change
> >in each, i.e. the change that called for bumping from 0.11.x to 0.12 and
> >from 0.12.x to 0.13.
> >
> >SourceForge has a source tarball from which you can build. It's the
> >traditional "./configure && make && make install" incantation.
>
> >There's also a "make check" which runs a variety of tests to confirm that
> >bogofilter is running correctly. Historically that's mostly found
> >portability issues with environments other than i586 linux.
>
> >David
>
>
>David
>Man I would call it a giant leap, as you can see, it's in and working
>fine.
>
>Just a couple more questions, I promise.
>I recall reading somewhere that when first using bogfilter and training,
>with 3 state output selected, there will be mails marked unsure,
>should there be a procmailrc recipe to filter these unsures to a seperate
>box,
>then doing the training? Or can I continue using the following
>in my .muttrc:
>
>macro index X "<enter-command>unset wait_key\n<pipe-entry>bogofilter
>-Ns\n<enter-command>set wait_key\n<delete-message>"
>macro pager X "<enter-command>unset wait_key\n<pipe-entry>bogofilter
>-Ns\n<enter-command>set wait_key\n<delete-message>"
>
>and achive the same results
>
>Here's a mail marked unsure, but it's spam:
>X-Bogosity: Unsure, tests=bogofilter, spamicity=0.914863, version=0.13.3
> int cnt prob spamicity histogram
>0.00 6 0.017857 0.003857 ######
>0.10 1 0.149509 0.009157 #
>0.20 0 0.000000 0.009157
>0.30 0 0.000000 0.009157
>0.40 0 0.000000 0.009157
>0.50 0 0.000000 0.009157
>0.60 0 0.000000 0.009157
>0.70 0 0.000000 0.009157
>0.80 0 0.000000 0.009157
>0.90 44 0.994929 0.633037 ############################################
>
>
>Thank you
>jerry
Jerry,
Imagine you're in the water, wading away from the shore. In a few feet,
suddenly, without notice, the bottom drops off and you're going to need to
start swimming... Are you read?
Bogofilter has two "cutoff" parameters, i.e. spam_cutoff and ham_cutoff,
that it uses to convert a spam score (which ranges from 0.0 to 1.0) to a
classification, i.e. spam, ham, or unsure. The default values are 0.95 and
0.10, which are conservative, but usable. There's also a min_dev parameter
to direct bogofilter to ignore tokens close to EVEN_ODDS (0.5). When you
first start using bogofilter, its list of known words is smallish and the
frequency of "unsures" is higher. As it gets trained with more ham and
spam, it will do better. Once you have a saved corpus of several thousand
ham and several thousand spam, it becomes reasonable to run tuning scripts
to find parameter values that best fit _your_ email. (This is the deep
water I warned of earlier).
So, for now, I recommend living with the need to train bogofilter when it
can't classify a message (or gets it wrong). If you're not using '-u'
(auto-update), training merely requires running bogofilter with the -n/-s
switch (as appropriate). If using '-u', and the message is scored as ham
(or spam), you'll need to use '-Ns' or '-Sn' to decrement counts in one
list and increment them in the other. Since '-u' doesn't register tokens
from "unsures", you'll only need '-n' or '-s'.
Hope I didn't push you too far from shore :-)
David
More information about the Bogofilter
mailing list