[PATCH] combined wordlist a.k.a. single list

Greg Louis glouis at dynamicro.on.ca
Tue Jun 3 22:15:00 CEST 2003


On 20030603 (Tue) at 1334:35 -0500, Jeremy Blosser wrote:
> On May 31, David Relson [relson at osagesoftware.com] wrote:

> > This patch converts bogofilter so that it stores all the data in a single 
> > file, wordlist.db, with records containing token, spam count, nonspam count 
> > and (optionally) the timestamp.
> 
> From an administrative perspective I *much* prefer separate wordlists.  I'm
> sure the db utils would probably make it possible to split them up and
> manipulate them separately when required

Could you expand on this a little?  When would that be required?

> Further, we run a lot of automated stuff to do wordlist management,
> rebuilds, etc., and I want to keep those tools as simple as possible.

It's not obvious to me at all why keeping one copy of the token with
the two counts is more complex to manage.  I guess that's because I
don't understand why one would need to manipulate spam and nonspam
tokens separately.

A rebuild, for example, is run with the same commands as ever, except
three fewer, since you only have one list to dump and reload at the end;
the process is something like:
  cat nonspams/* | bogofilter -n
  cat spams/* | bogofilter -s
  bogoutil -d wordlist.db | bogoutil -l wordlist.new
  mv wordlist.new wordlist.db
  db_verify wordlist.db

> I know you're looking for speedups, and appreciate that given the size of
> our implementation, but the two list version is plenty fast for our setup
> now.  I don't know that the loss of convenience here is worth the potential
> speed up.

I hear you, but I would like to understand the loss of convenience,
since my experience is the opposite.  I'm running one single-list and
two two-list installations at the moment, and I'm planning to convert
the two-list ones soon.  I know your environment is a lot bigger and
more complex than mine; that may be what's making the difference.

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the bogofilter-dev mailing list