specification of wordlist directory
David Relson
relson at osagesoftware.com
Tue Dec 24 20:42:03 CET 2002
At 02:26 PM 12/24/02, Gyepi SAM wrote:
>On Tue, Dec 24, 2002 at 10:35:44AM -0500, David Relson wrote:
> > At 10:24 AM 12/24/02, Gyepi SAM wrote:
> > >This is all very complicated. How about a simpler idea:
> > >bogofilter uses a single primary directory for its wordlists. This is what
> > >happens now, except we don't call it that. Additional, secondary,
> > >directories
> > >can be added, either in a config file or the commandline. The key is that
> > >secondary directories are always considered readonly. All write operations
> > >are performed to the primary databases. This makes sense in the context of
> > >shared system-wide databases: in order to update the system databases,
> the
> > >sysadmin must specify their directory as the primary. For everyone else,
> > >the system directories are secondary.
> >
> > Good! How does bogofilter know primary from secondary?
>
>Primaries could still use the -d option. I'd guess we have to choose
>another letter for secondaries. The config files would presumably need
>different ways to specify the two types.
>
> > Since bogofilter reads /etc/bogofilter.cf (system config file) before
> > reading ~/.bogofilter.cf (user config file), it seems that the
> directory to
> > update should be the last seen.
>
>Reading /etc/bogofilter.cf first is good. That way ~/.bogofilter.cf can
>overridethe global config as necessary. (Note this would imply that there
>should be options to 'unset' previously set options. But that's a separate
>issue) But relying on the order to determine what directory to update is
>not a good idea.
>Incidentally, if /etc/bogofilter.cf is always read then the sysadmin
>cannot use it alone when updating the system-wide files: they'd need
>another config file also, or some command line options. Aieee, this *is*
>too complicated.
Yep. It's intricate (tricky?). The config file is read after the command
line is parsed so that the "-c filename" and "-C" options can take
precedence. Those options allow a single named config file (or no config
file at all) to be used.
>My philosophy about such things is: if it is hard to explain or takes too long
>to explain no one will understand it. which means that the default
>behaviour should be intuitive and easy to understand. Since most users
>will not require multiple wordlists, I'd say that the default behaviour
>(whatever it is) should be simple and straightforward. Advanced users and
>sysadmins can turn on the complicated behaviours *after* they read the
>manual. Of course, given the number of 'sysadmins' who aren't I expect
>we'll still get a fair number of questions.
>Which is all the more reason to nail down an intuitive or easily explainable
>set of behaviours.
Here's my stab at reasonable behavior:
1 - collect all directories specified by "-d directory" command line
options and "bogofilter_dir=path" config file lines.
2 - if no directories given in (1), check for environment variables
BOGOFILTER_DIR and HOME. Use the first one found.
3 - when computing spamicity score, use the wordlists in the directories of
(1) and (2)
4 - when updating wordlists (-u, -s, -n, -S, -N options), use the last
directory named in (1) and (2)
My next best idea is change (4) to:
4' - when updating wordlists, use the first directory named in (1) and (2)
More information about the bogofilter-dev
mailing list