/etc/bogofilter.cf
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Fri Jan 24 13:58:46 CET 2003
David Relson wrote:
> Good suggestions! It sounds like you have a vision in your head of how
> bogofilter.cf should look. Would you mind doing the reorganization and
> sending me the result? I'll review it, of course, and likely change it
> some before releasing it.
I'll do along the lines of my previous mail. Search for @@@
where more work has to be done.
> I look forward to seeing a clearer example file. I had an original goal of
> making it self documenting so that we don't need to maintain a separate man
> page.
I like this goal and actually I have not looked at any other
documentation while writing the previous mail.
> P.S. I'm CC'ing this to the list as others may have ideas to contribute.
Sure, no need to CC me, though;-)
I'll move things around heavily. I'll also introduce main
subsections. Some things might get wrong, though;-)
>>1) Intro (good)
I added some text. Standard disclaimer: I am not a native
speaker, please make sure you proveread everything:-))
>>3) BLOCK ON SUBNETS (I don't understand that text)
>
> The tokenizer creates 4 special tokens for each IPADDRESS. For each one
> add a tag and then 4, 3, 2, and 1 octets of the ip address. For example,
> 1.2.3.4 would give tokens "url:1.2.3.4", "url:1.2.3", "url:1.2",
> "url:1". The idea was to provide more information for identifying spammers
> by ip address.
I added a line of explanation. I made the comment active.
This goes down! I take ALGORITHM even further down!
>>4) BOGOFILTER_DIR (good)
I made one example active.
Moved
>>13) USER_CONFIG_FILE (why this, why not next to 4)?)
>>
>>14) WORDLIST (I guess it should follow 13))
up.
I changed the headline for 14).
>>5) CHARSET info (not clear which effect this has, no need
>>for that many other examples, one is enough)
>>
>>6) REPLACE_NONASCII_CHARACTERS (good, but unlear if effected
>>by 5), no need for #replace_nonascii_characters=Y)
I joined those.
>>9) Robinson Constants (this is seriously missplaced,
>>furthermore, it is not clear which algorithms are effected
>>by this setting, i.e., if this is needed for Graham or Fisher)
>
> Roughly speakint, Fisher (also called Robinson-Fisher) takes the Robinson
> calculation and does a chi-square test using the Robinson value and the
> number of tokens involved. As a result settings for Robinson generally
> apply to Fisher.
I added a line of explanation.
>>10) STATS_IN_HEADER (good, but no need for #stats_in_header=N)
>>
>>11) CUTOFF Values (missplaced, I'd suggest to have all
>>things which define values for specific algorithms to the
>>very end, many values here are also listed below, good
>>explanation, though)
>>
>>12) THRESHOLD Values (???)
>
> This'll take a while to explain. I'll have to send you another message on
> these.
>
>
>>
>>15) New Message Formatting Options (this need to be renamed,
>>good text)
>>
>>16) more details for algorithms (should stay here, but
>>settings from 9), 11), and 12)? should be merged here)
More information about the Bogofilter
mailing list