Tracking metadata and other options (was: token degeneration)

Jake Di Toro karrde+bogofilter at viluppo.net
Tue Jul 29 20:32:49 CEST 2003


On Tue, Jul 29, 2003 at 11:05:59AM -0400, David Relson wrote:
> To get back to bogofilter, I've added degeneration capabilities.  I've also 
> provided options to turn them on or off since the effort to match an 
> unknown token can be time consuming.  Here are the new command line flags 
> for degeneration:
> 
> -PD - disable degeneration (default)
> -Pd - enable degeneration
> -Pf - enable first match (default)
> -PF - enable best indicator
> 
> To explain a bit more...

Hmmmm, seems like a good time to bring up this idea that's been
floating arround.  

Metadata tokens is something that's been in the back of my head for a
while.  When the query for how to handle "text" vs "text/" came up I
was of the attitude go for it, but didn't say anything.  The
conversation of adding new features came up, the answer was "do the
code your self, test it, and if it proves to descriminate well it can
be added".  

And here comes along another new feature, with yet more command line
parameters.  

Two things com to mind.  One, we're going to run out of command line
characters eventually.  Two, does something this "esoteric" really
need a command line option.  Let's face it David, pi, and Greg are
going to be some of the few who actually do some extensive testing on
this feature over the next few days.  They will determine wether this
feature is benifical or not, set the default appropiatelly, and then
90%+ of the users will never look or think of the option again.  Which
leads me to think that wouldn't something like this be better suited
as a config file option as opposed to a command line?

The only objection I see is from the testing standpoint, it's eaiser
to use sommand line options than config files.  How abou implementing
something along the lines of:

	  'bogofilter -d . -c config --option "case_folding=no;  degeneration=yes"'

where you can specify config file parameters from the command line
that will override the specified '-c'.  Or does that already exist??

In the end I think this would allow people to add in more
discriminators that could be carried over in the mainline even when
they are of marginal value.  Just have a policy that they have to go
in w/ no obvious bugs, and that person must be willing to maintain the
code.  If at a later date a test is causing problems and the person
is unable or unwilling to maintain the rule/test, it will be removed.
What has been sacrificed other than a unique config file option which
is eaiser to come up with non-conflicting names that single/two
character command line options.

Maybe eventually tests could be configure options at compile time to
make your bogofilter nice and lean....  But that's a different topic.

-- 
Till Later, Jake <karrde+bogofilter at viluppo.net>
-----------------------------------------------
Direct replys are likley to be flagged as spam.
Drop the +addy if you need to reply direct.




More information about the Bogofilter mailing list