question on multiple wordlists

David Relson relson at osagesoftware.com
Fri Oct 11 18:56:16 CEST 2002


At 11:51 AM 10/11/02, Eric Seppanen wrote:
>On Fri, Oct 11, 2002 at 10:30:03AM -0400, David Relson wrote:
> > Eric,
> >
> > I've been wondering about your plans for multiple word lists and hope you
> > can let us in on them a bit more.
>
>No problem.  All this stuff is negotiable, of course.
>
> >  From your comments, I gather that the config file will be used to specify
> > the word lists (names, attributes, etc).  Can you give us some sample 
> lines?
>
>So far, all I've done is the ability to pull in additional DB files.  To
>do this, all you'd need to specify is a name, file, and weight, like
>
>wordlist systemspam /etc/bogofilter/bigspam.db -0.8
>wordlist systemgood /etc/bogofilter/biggood.db 0.8
>
>Though I'm thinking real hard about whether single-line whitespace-
>separated config lines are good enough.  I'm leaning toward committing the
>config-file stuff I've got, it's isolated from the rest of the code so it
>shouldn't do any harm if it needs to be replaced later.

I think a config file like you describe would be good.  Single line entries 
and white space separation are fine and easy to work with.  Alternatives 
like property lists and xml are available, but are overkill for our needs.

Go ahead and commit the config-file stuff.  Having it available will be good.

> > How will multiple wordlists be maintained?  Currently bogofilter uses
> > switches to maintain goodlist.db and spamlist.db.  Single letter switches
> > don't extend to multiple lists very well.  What are your plans for this?

>Actually, I hadn't though about it too hard, I'd assume that it'd be
>straightforward to do an option to specify the list to update, like
>--wordlist <filename>

--wordlist <filename> would work well for simple updates, e.g. '-s' and 
'-n'.  For transfers ('-S' and '-N') two wordlists are needed.  We could 
--spamlist <filename> and --goodlist <filename>.


> > I think I speak for all of us when I say that we're curious about the
> > direction you're going.
>
>The goals are:
>
>- allow user-specified lists so that new users can start off with someone
>else's spam list to get them started, but that "seed" list is stored
>separately from their list.
>
>- allow user-specifiied lists so that sysadmins can install system-wide
>lists that are available to all users, without interfering with the user's
>own lists.

The ability to have system lists and user lists is valuable.


>- allow use of plaintext lists (possibly requiring conversion to db
>format) for whitelisting, blacklisting, ignore-listing.

Yes!


>In general, I think user-specifiied lists are intended to be
>hand-maintained, so I don't think we need to worry about implementing the
>equivalent of -S, -N, -u for those lists.

Sounds reasonable and keeps it easy.  --wordlist with -s and -n will allow 
simple maintenance.  --spamlist and --goodlist used with -S and -N will 
allow correction of improper updates.


For summay digest subscription: bogofilter-digest-subscribe at aotto.com



More information about the Bogofilter mailing list