Deb package
David Relson
relson at osagesoftware.com
Tue Jan 28 14:34:11 CET 2003
Tom,
After reading your message, I confess I am somewhat confused. You seem to
be providing information to answer some of your questions and you seem to
have formed some impressions that are contrary to the release info and
examples included with bogofilter. As always for the newest code, use the
rpms or tarball on SourceForge. Other packages tend to lag behind the
latest releases.
I've answered a number of the questions and have asked for further
information where I didn't understand your query.
David
At 06:19 AM 1/28/03, Tom Allison wrote:
>bogofilter version 0.10.1
>
>OK! A few changes in there to the docs....
>
>And now for a few questions/feedback:
>
> From the manpage:
>
> Since then, Robinson and others have realized that the S
> calculation can be further optimized: if a vector of
> length k contains random, uniformly-distributed probabili?
> ties p, then -2 * sum(ln(p)) is distributed as chi-squared
> with 2n degrees of freedom. This is believed to be the
> most sensitive test of the hypothesis that the vector of
> probabilities is, in fact, uniformly distributed. Bogofil?
> ter now offers the option of applying this test (known as
> Fisher's method) to yield P(spam) and P(not spam), and
> using the difference as the "spamicity" score.
>
>Is this the Robinson-Fischer method that you reference later on in the
>options? It's not identified here and there's not explaination as to
>why/what -f would do differently from -r.
Yes. Your quote mentions "Fisher's method" and "believed to be the most
sensitive test...". It also mentions the "chi-squared" test which is the
additional test that distinguishes Fisher's method from plain old
Robinson. It seems that the manpage section you quoted has all the answers
to the questions you have asked.
>The -3 option tells bogofilter to use three-state classi?
> fication for the message, i.e. classify the message as
> ham, spam, or unsure. This option is effective only if
> ham_cutoff is non-zereo.
>
>Besides a default in the /etc/bogofilter.rc it might be nice to have a
>suggested number here:
>
>"...ham_cutoff is non-zero. (try 0.10)"
The default algorithm is presently Robinson. Bogofilter's help message
says this and bogofilter.cf.example includes "algorithm=robinson" and other
settings consistent with robinson. If additional statements are needed,
please indicate what and where.
bogofilter.cf.example also shows several possible option
combinations. Under "fisher (with Yes/No/Unsure, i.e. '-3')" you'll see
"ham_cutoff=0.10" and "spam_cutoff=0.95". Again I ask, if additional
statements are needed, please indicate what and where.
>I thought that MIME was going to be decoded. What killed that
>idea? Performance? What if I'm stubborn and want to do MIME
>anyways... I know that there have been some various posts about tools
>used and methods. Did anything decisive come from this?
The release notes clearly state "Added mime processing...with decoding" and
also mention the fixing of multiple problems. What gives you the
impression that mime isn't being decoded and has been killed?
More information about the Bogofilter
mailing list