support for multiple wordlists

Tom Allison tallison at tacocat.net
Mon May 17 23:57:06 CEST 2004


David Relson wrote:
> Greetings,
> 
> At one time, bogofilter had support for multiple wordlists.  I'm
> thinking of resurrecting the code.  Here's how I think it should
> operate:
> 
> Wordlists have a number of attributes, notably name, filename,
> precedence, and type.  
> 
> Name:  a short identifying symbol used when printing (error) messages. 
> Examples are "global", "user", "ignore".
> 
> Filename:  When opening the wordlist, if the name is fully qualifified
> (with a leading '/' or '~'), that name is used, else the usual search
> order is used, i.e. $BOGOFILTER_DIR, $BOGODIR, $HOME.
> 
> Precedence: an integer like 1, 2, 3, ...  Wordlists are searched in
> ascending order for the token.  If the search token is found, lists with
> the same precedence number will be checked (and counts added together). 
> Lists with higher precedence numbers will not be checked.
> 
> Type: 'R' and 'I' (for "regular" and "ignore").  Current wordlists are
> of type 'R'. Type 'I' means "don't score the token if found in the
> ignore list".
> 

I had assumed that if you you had both /etc/bogofilter/wordlist.db (or 
/var/lib/bogofilter/wordlist.db) and ~/.bogofilter/wordlist.db that they 
might be shared in some way (probably with global first, user second, 
just like procmail rules).
I guess I was just thinking of going with lots of procmail glue to make 
this all happen.

> Example 1 - merge user and system lists:
> 
>   wordlist=user, ~/wordlist.db, 1, R
>   wordlist=system, /var/spool/bogofilter/wordlist.db, 1, R
> 
> Example 2 - prefer user to system list:
> 
>   wordlist=user, ~/wordlist.db, 2, R
>   wordlist=system, /var/spool/bogofilter/wordlist.db, 3, R
> 
> Example 3 - prefer system to user list:
> 
>   wordlist=user, ~/wordlist.db, 5, R
>   wordlist=system, /var/spool/bogofilter/wordlist.db, 4, R
> 
> Example 4 - prefer user list to system list.  If not in user list and in
> ignore list, don't check further:
> 
>   wordlist=user, ~/wordlist.db, 6, R
>   wordlist=ignore, ~/ignoreist.db, 7, I
>   wordlist=system, /var/spool/bogofilter/wordlist.db, 8, R
> 

commas and such make me dizzy.  The order numbers (5,6,7,8) might be 
better replaced with a single paramater (eg: wordlist_order).  Also, the 
terms you've described above may have redundancy.  Isn't the ignore and 
'R' redundant?  What happens when I have
"wordlist=ignore, ~/ignorelist.db, 7, R" ???  (See your NOTE 3)

suggestion:
wordlist_user= ~/.bogofilter/wordlist.db
wordlist_global=/var/lib/bogofilter/wordlist.db
wordlist_ignore=~/.bogofilter/ignorelist.db
wordlist_order= global ignore user (whitespace seperated: " " or \n...)

This provides specific exclusion from the checks of 'ignore' and 'R' 
being required.  And the order of precedence appears in one line of 
configuration file and not across 3 (or more if you have a lot of REM'ed 
lines for old stuff)

How would you affect the seperate wordlists for configurations (min_dev, 
threshold, robx... bogotune stuff)?  I think this only applies to 
global/user lists.

> Note 1: bogofilter's registration flags ('-s', '-n', '-u', '-S', '-N' )
> will apply to the first list named.

Similar to bogotune could you default to the wordlist_user for these 
params unless you specified otherwise.  Not sure, but maybe:
bogofilter -u   ==> defaults to wordlist_user
bogofilter -u wordlist_global ==> only wordlist_global
bogofilter -u wordlist_global wordlist_user  ==> does both: space 
seperated list?

A really complicated version would be something like:
bogofilter -pe wordlist_global -u wordlist_user (assumes previous -pe?)
bogofilter -n wordlist_user -Sn wordlist_global
bogofilter -Sn wordlist_user wordlist_global  (space seperated list 
affects both)

(the -peu example above might be pretty lame...)

> Note 2: to build an ignore list, create a text file (for example,
> ignorelist.txt) using any text editor, then use bogoutil to convert it
> to database format, e.g. "bogoutil -l ignorelist.db < ignorelist.txt".
> 

OK: echo "foo" | bogoutil -l ignorelist.db
should work as well for individuals.

> Note 3: having lists of types 'R' and 'I' of the same precedence won't
> be allowed because the types are contradictory.

See comments about wordlist_user and such above.  I think you can 
relabel the parameters and exclude this problem from happening.




More information about the Bogofilter mailing list