specification of wordlist directory

David Relson relson at osagesoftware.com
Tue Dec 24 14:47:58 CET 2002


At 07:24 AM 12/24/02, Greg Louis wrote:

>On 20021223 (Mon) at 2312:36 -0500, David Relson wrote:
>
> > 1. If BOGOFILTER_DIR is defined, remember its value.
> > 2. If not defined, check for HOME and remember its value.
> > 3. When parsing the command line, "-d directory" can supercede any prior
> > value.
> >
> > So, neither environment variable is required and bogofilter shouldn't
> > complain if neither is defined.  Bogofilter _should_ complain if 
> there's no
> > directory specified when it's time to open the wordlists.
>
>Good point.
>
> > There's an additional, older idea still around.  Bogofilter has code to
> > work with a list of wordlists, i.e. more than just the two normally
> > used.  At one time, using multiple "-d directory" switches on the command
> > line would add pairs of good/spam wordlists for use in calculating a 
> word's
> > spamicity score.  We could allow multiple config file lines to name
> > directories.  Doing this would allow an admin to configure a system
> > wordlist plus allowing users to have their own wordlists.  Do we want to
> > disable this or to fully enable it?
>
>I'd vote for enabling; so far I haven't been approached by any power
>users who want their own lists, but it would be nice not to have to run
>bogofilter twice for such people.  What happens with -n and friends,
>though, if there are several directory lines in the config file?

I've continued to think about this and have newer, somewhat different 
thoughts.  I no longer like the idea of a three level precedence based on 
environment, command line, and config file (in whatever order).  I'm 
leaning towards using _all_ directory names from the command line and 
config file.  If these didn't give directory info, check the 
environment.  When classifying messages, _all_ the wordlists in all the 
directories would be used.

Writing wordlists is a bit different.  The simple case is when only one 
directory is specified.  It's obvious what to do.

More complex is when several directories are named, such as the power 
user(s) mentioned above.  I'd expect their environment to use the system 
wordlists as well as private user wordlists (though some might just their 
private lists).  In either case, bogofilter would (could) be run from their 
.procmailrc and their config will necessarily supply the info bogofilter 
needs.  I'd expect the common usage would be to update the user's word 
lists.  Since bogofilter reads the system config file before the user 
config file, the last directory name seen is the one to use for updating.

For updating the system wordlists, the sys admin could (should) give 
bogofilter exactly one directory.  "bogofilter -d directory -C" (which 
doesn't read a config file) would do the task or "bogofilter -c configfile" 
(which reads only the named config file) would also work.

Conclusion:  when writing wordlists use the last path encountered during 
bogofilter startup.






More information about the bogofilter-dev mailing list