specification of wordlist directory

David Relson relson at osagesoftware.com
Tue Dec 24 20:42:03 CET 2002


At 02:26 PM 12/24/02, Gyepi SAM wrote:

>On Tue, Dec 24, 2002 at 10:35:44AM -0500, David Relson wrote:
> > At 10:24 AM 12/24/02, Gyepi SAM wrote:
> > >This is all very complicated. How about a simpler idea:
> > >bogofilter uses a single primary directory for its wordlists. This is what
> > >happens now, except we don't call it that. Additional, secondary,
> > >directories
> > >can be added, either in a config file or the commandline. The key is that
> > >secondary directories are always considered readonly. All write operations
> > >are performed to the primary databases. This makes sense in the context of
> > >shared system-wide databases: in order to update the system databases, 
> the
> > >sysadmin must specify their directory as the primary. For everyone else,
> > >the system directories are  secondary.
> >
> > Good!  How does bogofilter know primary from secondary?
>
>Primaries could still use the -d option. I'd guess we have to choose
>another letter for secondaries. The config files would presumably need
>different ways to specify the two types.
>
> > Since bogofilter reads /etc/bogofilter.cf (system config file) before
> > reading ~/.bogofilter.cf (user config file), it seems that the 
> directory to
> > update should be the last seen.
>
>Reading /etc/bogofilter.cf first is good. That way ~/.bogofilter.cf can 
>overridethe global config as necessary. (Note this would imply that there 
>should be options to 'unset' previously set options. But that's a separate 
>issue) But relying on the order to determine what directory to update is 
>not a good idea.
>Incidentally, if  /etc/bogofilter.cf is always read then the sysadmin 
>cannot use it alone when updating the system-wide files: they'd need 
>another config file also, or some command line options. Aieee, this *is* 
>too complicated.

Yep.  It's intricate (tricky?).  The config file is read after the command 
line is parsed so that the "-c filename" and "-C" options can take 
precedence.  Those options allow a single named config file (or no config 
file at all) to be used.

>My philosophy about such things is: if it is hard to explain or takes too long
>to explain no one will understand it. which means that the default 
>behaviour should be intuitive and easy to understand. Since most users 
>will not require multiple wordlists, I'd say that the default behaviour 
>(whatever it is) should be simple and straightforward. Advanced users and 
>sysadmins can turn on the complicated behaviours *after* they read the 
>manual. Of course, given the number of 'sysadmins' who aren't I expect 
>we'll still get a fair number of questions.
>Which is all the more reason to nail down an intuitive or easily explainable
>set of behaviours.

Here's my stab at reasonable behavior:

1 - collect all directories specified by "-d directory" command line 
options and "bogofilter_dir=path" config file lines.
2 - if no directories given in (1), check for environment variables 
BOGOFILTER_DIR and HOME.  Use the first one found.
3 - when computing spamicity score, use the wordlists in the directories of 
(1) and (2)
4 - when updating wordlists (-u, -s, -n, -S, -N options), use the last 
directory named in (1) and (2)

My next best idea is change (4) to:

4' - when updating wordlists, use the first directory named in (1) and (2)






More information about the bogofilter-dev mailing list