support for multiple wordlists
relson at osagesoftware.com
Mon May 17 14:54:17 EDT 2004
On Mon, 17 May 2004 11:33:23 -0700
Greg McCann wrote:
> On 5/17/2004 at 9:53 AM David Relson <relson at osagesoftware.com> wrote:
> >At one time, bogofilter had support for multiple wordlists. I'm
> >thinking of resurrecting the code.
> Yes, please. I would love to have at least the ignore list back. I
> am currently using an occasional filtering of the wordlist db to
> remove words I would like to ignore, but of course these words
> immediately start creeping back into the database again.
> I would also like to have the option for a separate spamlist and
> goodlist again. I get an order of magnitude more spam than ham, and
> good words seem to be more stable that spam words which change
> constantly due to the new products that are released weekly and the
> ever-more-creative spellings of existing products. It was helpful to
> me to be able to apply different maintenance routines to each list,
> keeping the good list relatively stable, while rotating through
> spamwords - deleting unused ones and adding new ones - more quickly.
> Greg McCann
The design allows separate lists. What you want is doable, though it
will be inefficent in storage usage and in speed.
If you define two lists with the same precedence, bogofilter will look
for the search token in both of them, and add corresponding ham/spam
counts and msg counts. Something like the following should work:
wordlist=S, ~/ListS.db, 1, R
wordlist=N, ~/ListN.db, 1, R
When registering, use
bogofilter -s --wordlist=S,~/ListS.db,1,R ...
bogofilter -n --wordlist=N,~/ListN.db,1,R ...
Tokens will be searched for in both lists (because of the same
precedence) and the counts will be added.
The inefficiencies are that 2 lists must be searched (but that's not a
problem for you) and that each entry will have both spam and ham counts.
In ListS.db all the ham counts will be 0 and in ListN.db all the spam
counts will be 0.
More information about the Bogofilter