[relson at osagesoftware.com: Re: Bogofilter 0.11.2 config questions]

David Relson relson at osagesoftware.com
Wed Jun 4 22:29:18 CEST 2003


At 04:09 PM 6/4/03, jerry wrote:
>----- Forwarded message from David Relson <relson at osagesoftware.com> -----
>
>From: David Relson <relson at osagesoftware.com>
>X-Bogosity: Ham, tests=bogofilter, spamicity=3.72e-04, version=0.13.3
>    int  cnt   prob  spamicity histogram
>   0.00   35 0.009490 0.007906 ###################################
>   0.10    5 0.126293 0.022912 #####
>   0.20    0 0.000000 0.022912
>   0.30    0 0.000000 0.022912
>   0.40    0 0.000000 0.022912
>   0.50    0 0.000000 0.022912
>   0.60    0 0.000000 0.022912
>   0.70    0 0.000000 0.022912
>   0.80    0 0.000000 0.022912
>   0.90    3 0.999643 0.307055 ###
>Status: RO
>Content-Length: 938
>Lines: 29
>
>
> >Jerry,
>
> >0.13.3 is a big step forwards from 0.11.2.  Be sure to read both
> >RELEASE.NOTES-0.12 and RELEASE.NOTES-0.13.  They cover the biggest change
> >in each, i.e. the change that called for bumping from 0.11.x to 0.12 and
> >from 0.12.x to 0.13.
> >
> >SourceForge has a source tarball from which you can build.  It's the
> >traditional "./configure && make && make install" incantation.
>
> >There's also a "make check" which runs a variety of tests to confirm that
> >bogofilter is running correctly.  Historically that's mostly found
> >portability issues with environments other than i586 linux.
>
> >David
>
>
>David
>Man I would call it a giant leap, as you can see, it's in and working
>fine.
>
>Just a couple more questions, I promise.
>I recall reading somewhere that when first using bogfilter and training,
>with 3 state output selected, there will be mails marked unsure,
>should there be a procmailrc recipe to filter these unsures to a seperate 
>box,
>then doing the training? Or can I continue using the following
>in my .muttrc:
>
>macro index X "<enter-command>unset wait_key\n<pipe-entry>bogofilter
>-Ns\n<enter-command>set wait_key\n<delete-message>"
>macro pager X "<enter-command>unset wait_key\n<pipe-entry>bogofilter
>-Ns\n<enter-command>set wait_key\n<delete-message>"
>
>and achive the same results
>
>Here's a mail marked unsure, but it's spam:
>X-Bogosity: Unsure, tests=bogofilter, spamicity=0.914863, version=0.13.3
>    int  cnt   prob  spamicity histogram
>0.00    6 0.017857 0.003857 ######
>0.10    1 0.149509 0.009157 #
>0.20    0 0.000000 0.009157
>0.30    0 0.000000 0.009157
>0.40    0 0.000000 0.009157
>0.50    0 0.000000 0.009157
>0.60    0 0.000000 0.009157
>0.70    0 0.000000 0.009157
>0.80    0 0.000000 0.009157
>0.90   44 0.994929 0.633037 ############################################
>
>
>Thank you
>jerry

Jerry,

Imagine you're in the water, wading away from the shore.  In a few feet, 
suddenly, without notice, the bottom drops off and you're going to need to 
start swimming...  Are you read?

Bogofilter has two "cutoff" parameters, i.e. spam_cutoff and ham_cutoff, 
that it uses to convert a spam score (which ranges from 0.0 to 1.0) to a 
classification, i.e. spam, ham, or unsure.  The default values are 0.95 and 
0.10, which are conservative, but usable.  There's also a min_dev parameter 
to direct bogofilter to ignore tokens close to EVEN_ODDS (0.5).  When you 
first start using bogofilter, its list of known words is smallish and the 
frequency of "unsures" is higher.  As it gets trained with more ham and 
spam, it will do better.  Once you have a saved corpus of several thousand 
ham and several thousand spam, it becomes reasonable to run tuning scripts 
to find parameter values that best fit _your_ email.  (This is the deep 
water I warned of earlier).

So, for now, I recommend living with the need to train bogofilter when it 
can't classify a message (or gets it wrong).  If you're not using '-u' 
(auto-update), training merely requires running bogofilter with the -n/-s 
switch (as appropriate).  If using '-u', and the message is scored as ham 
(or spam), you'll need to use '-Ns' or '-Sn' to decrement counts in one 
list and increment them in the other.  Since '-u' doesn't register tokens 
from "unsures", you'll only need '-n' or '-s'.

Hope I didn't push you too far from shore :-)

David






More information about the Bogofilter mailing list