Deb package

David Relson relson at osagesoftware.com
Tue Jan 28 14:34:11 CET 2003


Tom,

After reading your message, I confess I am somewhat confused.  You seem to 
be providing information to answer some of your questions and you seem to 
have formed some impressions that are contrary to the release info and 
examples included with bogofilter.  As always for the newest code, use the 
rpms or tarball on SourceForge.  Other packages tend to lag behind the 
latest releases.

I've answered a number of the questions and have asked for further 
information where I didn't understand your query.

David

At 06:19 AM 1/28/03, Tom Allison wrote:

>bogofilter version 0.10.1
>
>OK!  A few changes in there to the docs....
>
>And now for a few questions/feedback:
>
> From the manpage:
>
>  Since then, Robinson and others have realized that  the  S
>        calculation  can  be  further  optimized:  if  a vector of
>        length k contains random, uniformly-distributed probabili?
>        ties p, then -2 * sum(ln(p)) is distributed as chi-squared
>        with 2n degrees of freedom. This is  believed  to  be  the
>        most  sensitive  test of the hypothesis that the vector of
>        probabilities is, in fact, uniformly distributed. Bogofil?
>        ter  now offers the option of applying this test (known as
>        Fisher's method) to yield P(spam)  and  P(not  spam),  and
>        using the difference as the "spamicity" score.
>
>Is this the Robinson-Fischer method that you reference later on in the 
>options?  It's not identified here and there's not explaination as to 
>why/what -f would do differently from -r.

Yes.  Your quote mentions "Fisher's method" and "believed to be the most 
sensitive test...".  It also mentions the "chi-squared" test which is the 
additional test that distinguishes Fisher's method from plain old 
Robinson.  It seems that the manpage section you quoted has all the answers 
to the questions you have asked.

>The  -3 option tells bogofilter to use three-state classi?
>        fication for the message, i.e.  classify  the  message  as
>        ham,  spam,  or  unsure.  This option is effective only if
>        ham_cutoff is non-zereo.
>
>Besides a default in the /etc/bogofilter.rc it might be nice to have a 
>suggested number here:
>
>"...ham_cutoff is non-zero.  (try 0.10)"

The default algorithm is presently Robinson.  Bogofilter's help message 
says this and bogofilter.cf.example includes "algorithm=robinson" and other 
settings consistent with robinson.  If additional statements are needed, 
please indicate what and where.

bogofilter.cf.example also shows several possible option 
combinations.  Under "fisher (with Yes/No/Unsure, i.e. '-3')" you'll see 
"ham_cutoff=0.10" and "spam_cutoff=0.95".  Again I ask, if additional 
statements are needed, please indicate what and where.

>I thought that MIME was going to be decoded.  What killed that 
>idea?  Performance?  What if I'm stubborn and want to do MIME 
>anyways...  I know that there have been some various posts about tools 
>used and methods.  Did anything decisive come from this?

The release notes clearly state "Added mime processing...with decoding" and 
also mention the fixing of multiple problems.  What gives you the 
impression that mime isn't being decoded and has been killed?






More information about the Bogofilter mailing list