support for multiple wordlists
relson at osagesoftware.com
Mon May 17 20:00:24 EDT 2004
On Mon, 17 May 2004 19:45:29 -0400
Tom Allison wrote:
> David Relson wrote:
> > On Mon, 17 May 2004 17:57:06 -0400
> > Tom Allison wrote:
> >>David Relson wrote:
> >>commas and such make me dizzy. The order numbers (5,6,7,8) might be
> >>better replaced with a single paramater (eg: wordlist_order). Also,
> >>the terms you've described above may have redundancy. Isn't the
> >>ignore and 'R' redundant? What happens when I have
> >>"wordlist=ignore, ~/ignorelist.db, 7, R" ??? (See your NOTE 3)
> > spaces instead of commas would be fine.
> >>wordlist_user= ~/.bogofilter/wordlist.db
> >>wordlist_order= global ignore user (whitespace seperated: " " or
> > In bogofilter's config file processing code, all lines are of the
> > form"key=value(s)" and there's a list of valid keys. Having
> > "key_name=value", "key_name2=value", is a problem.
> I wasn't aware of this. I don't recall seeing any examples of this.
> I was thinking if you did this approach of using a completely distinct
> key for each of the three types of wordlists you presented, then I
> assumed it would be trivial to modify them in the command line with
> --wordlist_user=... similar to how you can modify min_dev et al.
> In this sense, you would simply add four new parameters
> (wordlist_user, wordlist_ignore, wordlist_global, wordlist_order) to
> bogofilter.cf with them defaulting to today's structure of:
> wordlist_order=user ignore global (this doesn't matter as there's
> only one!)
> Or am I missing the idea that you might have many wordlists, not just
> the three you proposed?
I'm not setting limits on the number of wordlists used. If someone
wishes to use many at once, I'm providing the opportunity.
> > Also, having a separate order precludes additive operations.
> >>This provides specific exclusion from the checks of 'ignore' and 'R'
> >>being required. And the order of precedence appears in one line of
> >>configuration file and not across 3 (or more if you have a lot of
> >>REM'ed lines for old stuff)
> >>How would you affect the seperate wordlists for configurations
> >>(min_dev, threshold, robx... bogotune stuff)? I think this only
> >>applies to global/user lists.
> > There's no effect. The scoring parameters are applied separately
> > from finding tokens in the wordlist(s).
> So you would have one set up min_dev/robx/robs for both global and
> user wordlists? I would think this could cost you a lot of
> I'm thinking ahead and would see an application for this where
> everyone on a mail server would access a global wordlist that is
> administrator managed with something like PI's train on error or
> something very "lean" because it will have to accomodate a lot of
> personal variations. the '-u' would not be used here.
> Subsequently each user who was interested, would have their own user
> wordlist (wordlist_user is defined) and could use '-u' and have more
> training effects on this one as well.
Yikes! Having separate parameter sets for each wordlist would be a
> > Command line parsing uses library function getopt() and optional
> > parameters are a problem. Given the number of platforms which run
> > bogofilter and the many variants of getopt(), using optional
> > parameters is a no-no.
> I don't understand this.
> Do you mean 'bogofilter -Sn wordlist_user wordlist_global' is bad?
> Could you do: 'bogofilter -Sn wordlist_user -Sn wordlist_global'
> without a 'no-no'? Or does the duplication of '-Sn' really send
> things over the edge.
The functions for parsing command line options work best when an option,
say '-Z' either never has an argument or always has an argument. Having
'-Z' sometimes with an argument and sometimes without an argument leads
to portability problems.
Updating more than 1 wordlist at a time is a no-no. If you need to
change two, use:
bogofilter -Sn wordlist_user < message
bogofilter -Sn wordlist_global < message
Registering the same message in multiple wordlists seems odd to me. A
user shouldn't be updating the system list and the sysadmin shouldn't be
updating the user's list.
> I thought you could do this based on the manpage use of bogotune -n
> implying that you could have multiple directories/files listed after
> the -n and similarly for the -s. My assumption was that the code was
Bogotune only uses 1 wordlist. It uses -n and -s to allow specification
of multiple message files for scoring/tuning.
> I was hoping to find a more straighforward approach to representing
> the different filenames/locations. I guess it depends on where you
> want to save the information about the wordlist. You do it as part of
> the definition of the key, "wordlist", I was doing as the name of the
> key, "wordlist_user".
There may be a more straightforward approach. Having lots of parameters
is complex any way you slice it.
More information about the Bogofilter