/etc/bogofilter.cf

David Relson relson at osagesoftware.com
Fri Jan 24 13:33:38 CET 2003


pi,

Good suggestions!  It sounds like you have a vision in your head of how 
bogofilter.cf should look.  Would you mind doing the reorganization and 
sending me the result?  I'll review it, of course, and likely change it 
some before releasing it.

Yes, there are some duplications in the file.  In some cases there are 
groups of options that go together, for example the message format options, 
and in other cases it's useful to show alternate forms on an option's 
value.  Also, as you say, it's silly to show both "Y" and "N" examples.

I look forward to seeing a clearer example file.  I had an original goal of 
making it self documenting so that we don't need to maintain a separate man 
page.

David

P.S.  I'm CC'ing this to the list as others may have ideas to contribute.

At 05:27 AM 1/24/03, Boris 'pi' Piwinger wrote:

>Hi!
>
>I used /etc/bogofilter.cf for my ~/.bogofilter.cf (just
>doing what I wanted to change). I have to say that this file
>is completely intricate. Various options are given more than
>once (I don't really want to know what happens, if you set
>them differntly at different places because you don't see
>they are set elsewhere). So I suggest a cleanup.
>
>Let me just quickly go over it.
>
>1) Intro (good)
>
>2) ALGORITHM (do we need some explanation?)
>
>3) BLOCK ON SUBNETS (I don't understand that text)



The tokenizer creates 4 special tokens for each IPADDRESS.  For each one 
add a tag and then 4, 3, 2, and 1 octets of the ip address.  For example, 
1.2.3.4 would give tokens "url:1.2.3.4", "url:1.2.3", "url:1.2", 
"url:1".  The idea was to provide more information for identifying spammers 
by ip address.

>4) BOGOFILTER_DIR (good)
>
>5) CHARSET info (not clear which effect this has, no need
>for that many other examples, one is enough)
>
>6) REPLACE_NONASCII_CHARACTERS (good, but unlear if effected
>by 5), no need for #replace_nonascii_characters=Y)
>
>7) SPAM_HEADER_NAME (good)
>
>8) MINIMUM DEVIATION (excellent, but no need for
># min_dev=0.2)
>
>9) Robinson Constants (this is seriously missplaced,
>furthermore, it is not clear which algorithms are effected
>by this setting, i.e., if this is needed for Graham or Fisher)

Roughly speakint, Fisher (also called Robinson-Fisher) takes the Robinson 
calculation and does a chi-square test using the Robinson value and the 
number of tokens involved.  As a result settings for Robinson generally 
apply to Fisher.  Difference that come to mind are:

         ham_cutoff - only applies to Fisher
         For numeric formatting options, the %e and %d are most useful for 
Robinson-Fisher because it generates so many values that are very, very 
near to 0 and to 1.

>10) STATS_IN_HEADER (good, but no need for #stats_in_header=N)
>
>11) CUTOFF Values (missplaced, I'd suggest to have all
>things which define values for specific algorithms to the
>very end, many values here are also listed below, good
>explanation, though)
>
>12) THRESHOLD Values (???)

This'll take a while to explain.  I'll have to send you another message on 
these.


>13) USER_CONFIG_FILE (why this, why not next to 4)?)
>
>14) WORDLIST (I guess it should follow 13))
>
>15) New Message Formatting Options (this need to be renamed,
>good text)
>
>16) more details for algorithms (should stay here, but
>settings from 9), 11), and 12)? should be merged here)
>
>pi
>
>
>---------------------------------------------------------------------
>FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
>To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
>For summary digest subscription: bogofilter-digest-subscribe at aotto.com
>For more commands, e-mail: bogofilter-help at aotto.com





More information about the Bogofilter mailing list