/etc/bogofilter.cf

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Fri Jan 24 13:58:46 CET 2003


David Relson wrote:

> Good suggestions!  It sounds like you have a vision in your head of how 
> bogofilter.cf should look.  Would you mind doing the reorganization and 
> sending me the result?  I'll review it, of course, and likely change it 
> some before releasing it.

I'll do along the lines of my previous mail. Search for @@@
where more work has to be done.

> I look forward to seeing a clearer example file.  I had an original goal of 
> making it self documenting so that we don't need to maintain a separate man 
> page.

I like this goal and actually I have not looked at any other
documentation while writing the previous mail.

> P.S.  I'm CC'ing this to the list as others may have ideas to contribute.

Sure, no need to CC me, though;-)


I'll move things around heavily. I'll also introduce main
subsections. Some things might get wrong, though;-)


>>1) Intro (good)

I added some text. Standard disclaimer: I am not a native
speaker, please make sure you proveread everything:-))

>>3) BLOCK ON SUBNETS (I don't understand that text)
> 
> The tokenizer creates 4 special tokens for each IPADDRESS.  For each one 
> add a tag and then 4, 3, 2, and 1 octets of the ip address.  For example, 
> 1.2.3.4 would give tokens "url:1.2.3.4", "url:1.2.3", "url:1.2", 
> "url:1".  The idea was to provide more information for identifying spammers 
> by ip address.

I added a line of explanation. I made the comment active.
This goes down! I take ALGORITHM even further down!

>>4) BOGOFILTER_DIR (good)

I made one example active.

Moved
>>13) USER_CONFIG_FILE (why this, why not next to 4)?)
>>
>>14) WORDLIST (I guess it should follow 13))
up.

I changed the headline for 14).


>>5) CHARSET info (not clear which effect this has, no need
>>for that many other examples, one is enough)
>>
>>6) REPLACE_NONASCII_CHARACTERS (good, but unlear if effected
>>by 5), no need for #replace_nonascii_characters=Y)

I joined those.

>>9) Robinson Constants (this is seriously missplaced,
>>furthermore, it is not clear which algorithms are effected
>>by this setting, i.e., if this is needed for Graham or Fisher)
> 
> Roughly speakint, Fisher (also called Robinson-Fisher) takes the Robinson 
> calculation and does a chi-square test using the Robinson value and the 
> number of tokens involved.  As a result settings for Robinson generally 
> apply to Fisher.

I added a line of explanation.

>>10) STATS_IN_HEADER (good, but no need for #stats_in_header=N)
>>
>>11) CUTOFF Values (missplaced, I'd suggest to have all
>>things which define values for specific algorithms to the
>>very end, many values here are also listed below, good
>>explanation, though)
>>
>>12) THRESHOLD Values (???)
> 
> This'll take a while to explain.  I'll have to send you another message on 
> these.
> 
> 
>>
>>15) New Message Formatting Options (this need to be renamed,
>>good text)
>>
>>16) more details for algorithms (should stay here, but
>>settings from 9), 11), and 12)? should be merged here)






More information about the Bogofilter mailing list