support for multiple wordlists

David Relson relson at osagesoftware.com
Mon May 17 15:53:16 CEST 2004


Greetings,

At one time, bogofilter had support for multiple wordlists.  I'm
thinking of resurrecting the code.  Here's how I think it should
operate:

Wordlists have a number of attributes, notably name, filename,
precedence, and type.  

Name:  a short identifying symbol used when printing (error) messages. 
Examples are "global", "user", "ignore".

Filename:  When opening the wordlist, if the name is fully qualifified
(with a leading '/' or '~'), that name is used, else the usual search
order is used, i.e. $BOGOFILTER_DIR, $BOGODIR, $HOME.

Precedence: an integer like 1, 2, 3, ...  Wordlists are searched in
ascending order for the token.  If the search token is found, lists with
the same precedence number will be checked (and counts added together). 
Lists with higher precedence numbers will not be checked.

Type: 'R' and 'I' (for "regular" and "ignore").  Current wordlists are
of type 'R'. Type 'I' means "don't score the token if found in the
ignore list".

Example 1 - merge user and system lists:

  wordlist=user, ~/wordlist.db, 1, R
  wordlist=system, /var/spool/bogofilter/wordlist.db, 1, R

Example 2 - prefer user to system list:

  wordlist=user, ~/wordlist.db, 2, R
  wordlist=system, /var/spool/bogofilter/wordlist.db, 3, R

Example 3 - prefer system to user list:

  wordlist=user, ~/wordlist.db, 5, R
  wordlist=system, /var/spool/bogofilter/wordlist.db, 4, R

Example 4 - prefer user list to system list.  If not in user list and in
ignore list, don't check further:

  wordlist=user, ~/wordlist.db, 6, R
  wordlist=ignore, ~/ignoreist.db, 7, I
  wordlist=system, /var/spool/bogofilter/wordlist.db, 8, R

Note 1: bogofilter's registration flags ('-s', '-n', '-u', '-S', '-N' )
will apply to the first list named.

Note 2: to build an ignore list, create a text file (for example,
ignorelist.txt) using any text editor, then use bogoutil to convert it
to database format, e.g. "bogoutil -l ignorelist.db < ignorelist.txt".

Note 3: having lists of types 'R' and 'I' of the same precedence won't
be allowed because the types are contradictory.

Feedback requested :-)

David



More information about the Bogofilter mailing list