/etc/bogofilter.cf
    Boris 'pi' Piwinger 
    3.14 at logic.univie.ac.at
       
    Fri Jan 24 13:58:46 CET 2003
    
    
  
David Relson wrote:
> Good suggestions!  It sounds like you have a vision in your head of how 
> bogofilter.cf should look.  Would you mind doing the reorganization and 
> sending me the result?  I'll review it, of course, and likely change it 
> some before releasing it.
I'll do along the lines of my previous mail. Search for @@@
where more work has to be done.
> I look forward to seeing a clearer example file.  I had an original goal of 
> making it self documenting so that we don't need to maintain a separate man 
> page.
I like this goal and actually I have not looked at any other
documentation while writing the previous mail.
> P.S.  I'm CC'ing this to the list as others may have ideas to contribute.
Sure, no need to CC me, though;-)
I'll move things around heavily. I'll also introduce main
subsections. Some things might get wrong, though;-)
>>1) Intro (good)
I added some text. Standard disclaimer: I am not a native
speaker, please make sure you proveread everything:-))
>>3) BLOCK ON SUBNETS (I don't understand that text)
> 
> The tokenizer creates 4 special tokens for each IPADDRESS.  For each one 
> add a tag and then 4, 3, 2, and 1 octets of the ip address.  For example, 
> 1.2.3.4 would give tokens "url:1.2.3.4", "url:1.2.3", "url:1.2", 
> "url:1".  The idea was to provide more information for identifying spammers 
> by ip address.
I added a line of explanation. I made the comment active.
This goes down! I take ALGORITHM even further down!
>>4) BOGOFILTER_DIR (good)
I made one example active.
Moved
>>13) USER_CONFIG_FILE (why this, why not next to 4)?)
>>
>>14) WORDLIST (I guess it should follow 13))
up.
I changed the headline for 14).
>>5) CHARSET info (not clear which effect this has, no need
>>for that many other examples, one is enough)
>>
>>6) REPLACE_NONASCII_CHARACTERS (good, but unlear if effected
>>by 5), no need for #replace_nonascii_characters=Y)
I joined those.
>>9) Robinson Constants (this is seriously missplaced,
>>furthermore, it is not clear which algorithms are effected
>>by this setting, i.e., if this is needed for Graham or Fisher)
> 
> Roughly speakint, Fisher (also called Robinson-Fisher) takes the Robinson 
> calculation and does a chi-square test using the Robinson value and the 
> number of tokens involved.  As a result settings for Robinson generally 
> apply to Fisher.
I added a line of explanation.
>>10) STATS_IN_HEADER (good, but no need for #stats_in_header=N)
>>
>>11) CUTOFF Values (missplaced, I'd suggest to have all
>>things which define values for specific algorithms to the
>>very end, many values here are also listed below, good
>>explanation, though)
>>
>>12) THRESHOLD Values (???)
> 
> This'll take a while to explain.  I'll have to send you another message on 
> these.
> 
> 
>>
>>15) New Message Formatting Options (this need to be renamed,
>>good text)
>>
>>16) more details for algorithms (should stay here, but
>>settings from 9), 11), and 12)? should be merged here)
    
    
More information about the bogofilter
mailing list