specification of wordlist directory
relson at osagesoftware.com
Tue Dec 24 08:47:58 EST 2002
At 07:24 AM 12/24/02, Greg Louis wrote:
>On 20021223 (Mon) at 2312:36 -0500, David Relson wrote:
> > 1. If BOGOFILTER_DIR is defined, remember its value.
> > 2. If not defined, check for HOME and remember its value.
> > 3. When parsing the command line, "-d directory" can supercede any prior
> > value.
> > So, neither environment variable is required and bogofilter shouldn't
> > complain if neither is defined. Bogofilter _should_ complain if
> there's no
> > directory specified when it's time to open the wordlists.
> > There's an additional, older idea still around. Bogofilter has code to
> > work with a list of wordlists, i.e. more than just the two normally
> > used. At one time, using multiple "-d directory" switches on the command
> > line would add pairs of good/spam wordlists for use in calculating a
> > spamicity score. We could allow multiple config file lines to name
> > directories. Doing this would allow an admin to configure a system
> > wordlist plus allowing users to have their own wordlists. Do we want to
> > disable this or to fully enable it?
>I'd vote for enabling; so far I haven't been approached by any power
>users who want their own lists, but it would be nice not to have to run
>bogofilter twice for such people. What happens with -n and friends,
>though, if there are several directory lines in the config file?
I've continued to think about this and have newer, somewhat different
thoughts. I no longer like the idea of a three level precedence based on
environment, command line, and config file (in whatever order). I'm
leaning towards using _all_ directory names from the command line and
config file. If these didn't give directory info, check the
environment. When classifying messages, _all_ the wordlists in all the
directories would be used.
Writing wordlists is a bit different. The simple case is when only one
directory is specified. It's obvious what to do.
More complex is when several directories are named, such as the power
user(s) mentioned above. I'd expect their environment to use the system
wordlists as well as private user wordlists (though some might just their
private lists). In either case, bogofilter would (could) be run from their
.procmailrc and their config will necessarily supply the info bogofilter
needs. I'd expect the common usage would be to update the user's word
lists. Since bogofilter reads the system config file before the user
config file, the last directory name seen is the one to use for updating.
For updating the system wordlists, the sys admin could (should) give
bogofilter exactly one directory. "bogofilter -d directory -C" (which
doesn't read a config file) would do the task or "bogofilter -c configfile"
(which reads only the named config file) would also work.
Conclusion: when writing wordlists use the last path encountered during
More information about the Bogofilter-dev