[PATCH] combined wordlist a.k.a. single list

Jeremy Blosser jblosser-bogofilter at firinn.org
Tue Jun 3 20:34:35 CEST 2003


On May 31, David Relson [relson at osagesoftware.com] wrote:
> At present, bogofilter's database consists of two files: goodlist.db and 
> spamlist.db.  Each record contains a token, a count of messages in which 
> that token had been encountered, and (optionally) a timestamp.
> 
> This patch converts bogofilter so that it stores all the data in a single 
> file, wordlist.db, with records containing token, spam count, nonspam count 
> and (optionally) the timestamp.

From an administrative perspective I *much* prefer separate wordlists.  I'm
sure the db utils would probably make it possible to split them up and
manipulate them separately when required even if they are stored in one
list, but a generic system/mail admin isn't that familiar with those tools,
and I don't necessarily think they should be expected to be.  Further, we
run a lot of automated stuff to do wordlist management, rebuilds, etc., and
I want to keep those tools as simple as possible.

I know you're looking for speedups, and appreciate that given the size of
our implementation, but the two list version is plenty fast for our setup
now.  I don't know that the loss of convenience here is worth the potential
speed up.

Just my .02 as a user/implementor...




More information about the bogofilter-dev mailing list