support for multiple wordlists

David Relson relson at osagesoftware.com
Tue May 18 01:08:07 CEST 2004


On Mon, 17 May 2004 17:57:06 -0400
Tom Allison wrote:

> David Relson wrote:
> > Greetings,
> > 
> > At one time, bogofilter had support for multiple wordlists.  I'm
> > thinking of resurrecting the code.  Here's how I think it should
> > operate:
> > 
> > Wordlists have a number of attributes, notably name, filename,
> > precedence, and type.  
> > 
> > Name:  a short identifying symbol used when printing (error)
> > messages. Examples are "global", "user", "ignore".
> > 
> > Filename:  When opening the wordlist, if the name is fully
> > qualifified(with a leading '/' or '~'), that name is used, else the
> > usual search order is used, i.e. $BOGOFILTER_DIR, $BOGODIR, $HOME.
> > 
> > Precedence: an integer like 1, 2, 3, ...  Wordlists are searched in
> > ascending order for the token.  If the search token is found, lists
> > with the same precedence number will be checked (and counts added
> > together). Lists with higher precedence numbers will not be checked.
> > 
> > Type: 'R' and 'I' (for "regular" and "ignore").  Current wordlists
> > are of type 'R'. Type 'I' means "don't score the token if found in
> > the ignore list".
> > 
> 
> I had assumed that if you you had both /etc/bogofilter/wordlist.db (or
> 
> /var/lib/bogofilter/wordlist.db) and ~/.bogofilter/wordlist.db that
> they might be shared in some way (probably with global first, user
> second, just like procmail rules).
> I guess I was just thinking of going with lots of procmail glue to
> make this all happen.

Hi Tom,

It's not at all obvious to me whether "global then user" is preferable
to "user then global".  Precedence varies with environment.

...[snip]...

> commas and such make me dizzy.  The order numbers (5,6,7,8) might be 
> better replaced with a single paramater (eg: wordlist_order).  Also,
> the terms you've described above may have redundancy.  Isn't the
> ignore and 'R' redundant?  What happens when I have
> "wordlist=ignore, ~/ignorelist.db, 7, R" ???  (See your NOTE 3)

spaces instead of commas would be fine.

> suggestion:
> wordlist_user= ~/.bogofilter/wordlist.db
> wordlist_global=/var/lib/bogofilter/wordlist.db
> wordlist_ignore=~/.bogofilter/ignorelist.db
> wordlist_order= global ignore user (whitespace seperated: " " or
> \n...)

In bogofilter's config file processing code, all lines are of the form
"key=value(s)" and there's a list of valid keys.  Having
"key_name=value", "key_name2=value", is a problem.

Also, having a separate order precludes additive operations.

> This provides specific exclusion from the checks of 'ignore' and 'R' 
> being required.  And the order of precedence appears in one line of 
> configuration file and not across 3 (or more if you have a lot of
> REM'ed lines for old stuff)
> 
> How would you affect the seperate wordlists for configurations
> (min_dev, threshold, robx... bogotune stuff)?  I think this only
> applies to global/user lists.

There's no effect.  The scoring parameters are applied separately from
finding tokens in the wordlist(s).

> > Note 1: bogofilter's registration flags ('-s', '-n', '-u', '-S',
> > '-N' ) will apply to the first list named.
> 
> Similar to bogotune could you default to the wordlist_user for these 
> params unless you specified otherwise.  Not sure, but maybe:
> bogofilter -u   ==> defaults to wordlist_user
> bogofilter -u wordlist_global ==> only wordlist_global
> bogofilter -u wordlist_global wordlist_user  ==> does both: space 
> seperated list?
> 
> A really complicated version would be something like:
> bogofilter -pe wordlist_global -u wordlist_user (assumes previous
> -pe?) bogofilter -n wordlist_user -Sn wordlist_global
> bogofilter -Sn wordlist_user wordlist_global  (space seperated list 
> affects both)

Command line parsing uses library function getopt() and optional
parameters are a problem.  Given the number of platforms which run
bogofilter and the many variants of getopt(), using optional parameters
is a no-no.

> (the -peu example above might be pretty lame...)
> 
> > Note 2: to build an ignore list, create a text file (for example,
> > ignorelist.txt) using any text editor, then use bogoutil to convert
> > it to database format, e.g. "bogoutil -l ignorelist.db <
> > ignorelist.txt".
> > 
> 
> OK: echo "foo" | bogoutil -l ignorelist.db
> should work as well for individuals.

Right!  Thanks for the reminder.

> > Note 3: having lists of types 'R' and 'I' of the same precedence
> > won't be allowed because the types are contradictory.
> 
> See comments about wordlist_user and such above.  I think you can 
> relabel the parameters and exclude this problem from happening.

Relabeling prevents this problem and causes others...

David



More information about the Bogofilter mailing list