default settings for creating wordlists in 0.15.7

David Relson relson at osagesoftware.com
Tue Nov 11 13:34:18 CET 2003


On Tue, 11 Nov 2003 14:56:01 +0400
Mike Lykov <combr at vesna.ru> wrote:

> Hi all.
> 
> I found this strange deviation in version 0.15.7 (newer version I
> don't want to install - it seems to be some unstable ;)
> 
> [combr at mail .bogofilter]$ bogofilter -h
> bogofilter version 0.15.7
>           -P {opts} - set html processing flag(s).
>              where {opts} is one or more of:
>               i   - enables  ignoring of upper/lower case.
>               I   - disables ignoring of upper/lower case (default).
> 
> (by the way - why it's ignoring by default?)
> 
>               h   - enables  header line tagging (default).
>               H   - disables header line tagging.

Mike,

Bogofilter used to be case insensitive, so "Mike", "mike", and "MIKE"
would all go into the wordlist as "mike".  It was changed some time ago
to be case sensitive and now capitalization matters and "Mike", "mike",
and "MIKE" are all different wordlist entries.  There was a big
discussion over the flags to use to enable/disable case insensitivity. 
The majority opinion was to use "i" and "I" for case insensitivity
(ignoring case differences) and to use lower case letters to enable a
feature and upper case to disable features.  As we wanted to have more
information in the wordlist, we wanted the defaults to be off for case
insensitivity, i.e. "-PI", and on for header line tagging, i.e. "-Ph".

Try running the following commands:

	echo this is a test | bogofilter -vvv
	echo this is a test | bogofilter -C -vvv
	echo this is a test | bogofilter -C -vvv -Ph
	echo this is a test | bogofilter -C -vvv -PH

The first one runs using _your_ bogofilter.cf file.  All the others run
without a config file, so the second gives the default behavior, the
third turns on header line tagging, and the fourth turns it off.  When I
run the commands, I see tokens "head:this" and "head:test" for the first
3 commands, but not the fourth.  That's the correct behavior (and
corresponds to the help message).

David

>  
> But really that's disabled:
> $bogofilter -n < newnonspam
> $bogoutil -d goodlist.db > good1

Question:  what version of bogofilter are you using?  The default
behavior is a combined wordlist (wordlist.db), rather than separate
wordlists (goodlist.db and spamlist.db).  Do you have a config file
setting non-standard options?


> in file good1 i cannot see any tags, like head: or from:
> x-mailer 6 20031111                                                   
>          
> x-mimeole 5 20031111 
> 
> if I add option like this:
> $bogofilter -n -Ph < newnonspam
> $bogoutil -d goodlist.db > good2
> 
> then I see in file good2:
> head:x-mailer 6 20031111                                              
>          
> head:x-mimeole 5 20031111
> 
> WHY ?
> 
> I think  - if it's default, then I must turn OFF in by -PH, and not as
> above ?




More information about the Bogofilter mailing list