support for multiple wordlists
Tom Allison
tallison at tacocat.net
Mon May 17 23:57:06 CEST 2004
David Relson wrote:
> Greetings,
>
> At one time, bogofilter had support for multiple wordlists. I'm
> thinking of resurrecting the code. Here's how I think it should
> operate:
>
> Wordlists have a number of attributes, notably name, filename,
> precedence, and type.
>
> Name: a short identifying symbol used when printing (error) messages.
> Examples are "global", "user", "ignore".
>
> Filename: When opening the wordlist, if the name is fully qualifified
> (with a leading '/' or '~'), that name is used, else the usual search
> order is used, i.e. $BOGOFILTER_DIR, $BOGODIR, $HOME.
>
> Precedence: an integer like 1, 2, 3, ... Wordlists are searched in
> ascending order for the token. If the search token is found, lists with
> the same precedence number will be checked (and counts added together).
> Lists with higher precedence numbers will not be checked.
>
> Type: 'R' and 'I' (for "regular" and "ignore"). Current wordlists are
> of type 'R'. Type 'I' means "don't score the token if found in the
> ignore list".
>
I had assumed that if you you had both /etc/bogofilter/wordlist.db (or
/var/lib/bogofilter/wordlist.db) and ~/.bogofilter/wordlist.db that they
might be shared in some way (probably with global first, user second,
just like procmail rules).
I guess I was just thinking of going with lots of procmail glue to make
this all happen.
> Example 1 - merge user and system lists:
>
> wordlist=user, ~/wordlist.db, 1, R
> wordlist=system, /var/spool/bogofilter/wordlist.db, 1, R
>
> Example 2 - prefer user to system list:
>
> wordlist=user, ~/wordlist.db, 2, R
> wordlist=system, /var/spool/bogofilter/wordlist.db, 3, R
>
> Example 3 - prefer system to user list:
>
> wordlist=user, ~/wordlist.db, 5, R
> wordlist=system, /var/spool/bogofilter/wordlist.db, 4, R
>
> Example 4 - prefer user list to system list. If not in user list and in
> ignore list, don't check further:
>
> wordlist=user, ~/wordlist.db, 6, R
> wordlist=ignore, ~/ignoreist.db, 7, I
> wordlist=system, /var/spool/bogofilter/wordlist.db, 8, R
>
commas and such make me dizzy. The order numbers (5,6,7,8) might be
better replaced with a single paramater (eg: wordlist_order). Also, the
terms you've described above may have redundancy. Isn't the ignore and
'R' redundant? What happens when I have
"wordlist=ignore, ~/ignorelist.db, 7, R" ??? (See your NOTE 3)
suggestion:
wordlist_user= ~/.bogofilter/wordlist.db
wordlist_global=/var/lib/bogofilter/wordlist.db
wordlist_ignore=~/.bogofilter/ignorelist.db
wordlist_order= global ignore user (whitespace seperated: " " or \n...)
This provides specific exclusion from the checks of 'ignore' and 'R'
being required. And the order of precedence appears in one line of
configuration file and not across 3 (or more if you have a lot of REM'ed
lines for old stuff)
How would you affect the seperate wordlists for configurations (min_dev,
threshold, robx... bogotune stuff)? I think this only applies to
global/user lists.
> Note 1: bogofilter's registration flags ('-s', '-n', '-u', '-S', '-N' )
> will apply to the first list named.
Similar to bogotune could you default to the wordlist_user for these
params unless you specified otherwise. Not sure, but maybe:
bogofilter -u ==> defaults to wordlist_user
bogofilter -u wordlist_global ==> only wordlist_global
bogofilter -u wordlist_global wordlist_user ==> does both: space
seperated list?
A really complicated version would be something like:
bogofilter -pe wordlist_global -u wordlist_user (assumes previous -pe?)
bogofilter -n wordlist_user -Sn wordlist_global
bogofilter -Sn wordlist_user wordlist_global (space seperated list
affects both)
(the -peu example above might be pretty lame...)
> Note 2: to build an ignore list, create a text file (for example,
> ignorelist.txt) using any text editor, then use bogoutil to convert it
> to database format, e.g. "bogoutil -l ignorelist.db < ignorelist.txt".
>
OK: echo "foo" | bogoutil -l ignorelist.db
should work as well for individuals.
> Note 3: having lists of types 'R' and 'I' of the same precedence won't
> be allowed because the types are contradictory.
See comments about wordlist_user and such above. I think you can
relabel the parameters and exclude this problem from happening.
More information about the Bogofilter
mailing list