/etc/bogofilter.cf
David Relson
relson at osagesoftware.com
Fri Jan 24 13:33:38 CET 2003
pi,
Good suggestions! It sounds like you have a vision in your head of how
bogofilter.cf should look. Would you mind doing the reorganization and
sending me the result? I'll review it, of course, and likely change it
some before releasing it.
Yes, there are some duplications in the file. In some cases there are
groups of options that go together, for example the message format options,
and in other cases it's useful to show alternate forms on an option's
value. Also, as you say, it's silly to show both "Y" and "N" examples.
I look forward to seeing a clearer example file. I had an original goal of
making it self documenting so that we don't need to maintain a separate man
page.
David
P.S. I'm CC'ing this to the list as others may have ideas to contribute.
At 05:27 AM 1/24/03, Boris 'pi' Piwinger wrote:
>Hi!
>
>I used /etc/bogofilter.cf for my ~/.bogofilter.cf (just
>doing what I wanted to change). I have to say that this file
>is completely intricate. Various options are given more than
>once (I don't really want to know what happens, if you set
>them differntly at different places because you don't see
>they are set elsewhere). So I suggest a cleanup.
>
>Let me just quickly go over it.
>
>1) Intro (good)
>
>2) ALGORITHM (do we need some explanation?)
>
>3) BLOCK ON SUBNETS (I don't understand that text)
The tokenizer creates 4 special tokens for each IPADDRESS. For each one
add a tag and then 4, 3, 2, and 1 octets of the ip address. For example,
1.2.3.4 would give tokens "url:1.2.3.4", "url:1.2.3", "url:1.2",
"url:1". The idea was to provide more information for identifying spammers
by ip address.
>4) BOGOFILTER_DIR (good)
>
>5) CHARSET info (not clear which effect this has, no need
>for that many other examples, one is enough)
>
>6) REPLACE_NONASCII_CHARACTERS (good, but unlear if effected
>by 5), no need for #replace_nonascii_characters=Y)
>
>7) SPAM_HEADER_NAME (good)
>
>8) MINIMUM DEVIATION (excellent, but no need for
># min_dev=0.2)
>
>9) Robinson Constants (this is seriously missplaced,
>furthermore, it is not clear which algorithms are effected
>by this setting, i.e., if this is needed for Graham or Fisher)
Roughly speakint, Fisher (also called Robinson-Fisher) takes the Robinson
calculation and does a chi-square test using the Robinson value and the
number of tokens involved. As a result settings for Robinson generally
apply to Fisher. Difference that come to mind are:
ham_cutoff - only applies to Fisher
For numeric formatting options, the %e and %d are most useful for
Robinson-Fisher because it generates so many values that are very, very
near to 0 and to 1.
>10) STATS_IN_HEADER (good, but no need for #stats_in_header=N)
>
>11) CUTOFF Values (missplaced, I'd suggest to have all
>things which define values for specific algorithms to the
>very end, many values here are also listed below, good
>explanation, though)
>
>12) THRESHOLD Values (???)
This'll take a while to explain. I'll have to send you another message on
these.
>13) USER_CONFIG_FILE (why this, why not next to 4)?)
>
>14) WORDLIST (I guess it should follow 13))
>
>15) New Message Formatting Options (this need to be renamed,
>good text)
>
>16) more details for algorithms (should stay here, but
>settings from 9), 11), and 12)? should be merged here)
>
>pi
>
>
>---------------------------------------------------------------------
>FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
>To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
>For summary digest subscription: bogofilter-digest-subscribe at aotto.com
>For more commands, e-mail: bogofilter-help at aotto.com
More information about the Bogofilter
mailing list